Architecture in 60 seconds
Every pull request passes through layers of increasing depth and cost. A risk classifier at Layer 2 gates whether the expensive Layer 3 runs. The Proactive Scanner operates independently on a cron schedule.
| Layer | What it does | Time | AI? |
|---|---|---|---|
| Layer 0: Context Collection | Collects diff, metadata, coverage, file classifications, repo knowledge, test reliability data | <10s | No |
| Layer 1: Deterministic Gate | Linters, SAST, secret scan, coverage delta, TORS filtering. Cannot be prompt-injected. | <60s | No |
| Layer 2: AI Quick Scan | Fast AI pass over diff. Risk classification (LOW→CRITICAL). Max 5 quick findings. AI-code detection. | <2 min | Yes (Haiku-class) |
| Layer 3: AI Deep Review | Full analysis with repo context, history, conventions. Multiple review lenses in parallel. MEDIUM+ risk only. | <10 min | Yes (Sonnet/Opus-class) |
| Reporter | Aggregates all layers into structured PR comment, inline comments, GitHub Check, optional Slack. | <30s | No |
| Proactive Scanner | Cron-based repo analysis: coverage trends, tech debt, convention drift, security baseline. | Scheduled | Yes |
📦 Full architecture reference
For the complete layer diagram, data flows, and output schemas, see the Architecture Reference.
What you need before starting
The following prerequisites are required for a pilot deployment. The first reference implementation targets GitHub; other platforms follow the same process with different adapters.
| Prerequisite | Details | Required? |
|---|---|---|
| GitHub repository with PRs | The v1 reference implementation is GitHub-only. Active PR flow needed for meaningful pilot data. | Yes |
| CI pipeline with test coverage reports | VCR reads coverage deltas to assess risk. Any format supported by your coverage tool. | Yes |
| AI API key | Anthropic (default). OpenAI and Azure OpenAI are configurable alternatives. | Yes |
| 30 days of test history | Required for TORS (Test Oracle Reliability Score) bootstrap. Without it, start with TORS disabled and build up data over the first month. | Recommended |
.vcr/ directory in repo root | Contains vcr-config.yaml, convention docs, review lenses, and custom rules. | Yes (created during setup) |
✅ No test history?
You can start without TORS and build up reliability data during the pilot. Set
layer1.tors.enabled: false in your config, then enable it after 30 days of CI data
have been collected.
Running a pilot
A pilot typically runs on 1-2 teams over 4-6 weeks. The steps below assume GitHub Actions as the CI platform.
Step 1: Install
Add the VCR GitHub Actions workflow and configuration directory to your repository:
.github/workflows/vcr-review.yaml, triggers on PR open/update.vcr/vcr-config.yaml, main configuration file.vcr/conventions.md, your team's coding conventions (AI reads this)
Step 2: Configure path classifications
Define which paths in your repository are critical, sensitive,
standard, or low_risk. This determines review depth and cost.
Step 3: Start conservative, Layer 2 only
For the first week, run Layer 2 (AI Quick Scan) only. Observe findings, check false positive rates, and calibrate risk classification against your team's expectations. Layer 3 stays disabled.
Step 4: Enable Layer 3
After the first week, enable Layer 3 for MEDIUM-risk and above. Monitor finding quality, acceptance rates, and cost. Tune risk thresholds based on actual data.
Step 5: Add custom lenses
If your domain has specific review needs (compliance, regulatory, domain-specific patterns),
add custom review lenses under .vcr/lenses/custom/. Each lens is a prompt template
that defines what the AI looks for.
Step 6: Enable the Proactive Scanner
Set up a weekly cron job for convention drift detection, coverage trends, and security baseline scanning. This runs independently of the PR flow and creates GitHub Issues for critical findings.
Key configuration decisions
The following choices have the most impact on VCR's effectiveness and cost. Get these right during the pilot and adjust based on data.
| Decision | Options | Guidance |
|---|---|---|
| Path classification | critical, sensitive, standard, low_risk | Start with auth, payments, and infrastructure as critical. API layers as sensitive. Everything else as standard. Docs and tests as low_risk. |
| Risk override rules | Per-path overrides in layer2.risk_overrides | Use always_high for auth paths. Use always_skip_layer3 for docs and markdown. |
| Model selection | Configurable per layer | Haiku-class for Layer 2 (fast, cheap). Sonnet-class for Layer 3 (balanced). Opus-class for CRITICAL PRs only (most capable, most expensive). |
| Budget caps | budget.max_daily_layer3_spend | Set a daily cap during pilot (e.g., $50/day). Prevents runaway costs if risk classification is miscalibrated. |
| Tone | direct, constructive, educational | constructive is the default and works for most teams. Use educational for teams with many junior developers. Use direct for experienced teams that prefer brevity. |
📦 Full configuration reference
For the complete vcr-config.yaml schema, repository structure, and all configurable
options, see the Configuration Reference.
Metrics to set up
Track these five metrics from day one of the pilot. They provide the minimum signal needed to evaluate whether VCR is working and where to tune.
| Metric | Why it matters for a pilot | Target |
|---|---|---|
| Time to first comment | Measures whether developers get feedback before context-switching. The primary developer experience metric. | <5 min (Layer 2 only), <15 min (Layer 2 + Layer 3) |
| Finding acceptance rate | Are VCR's findings useful? Low acceptance means prompts or risk classification need tuning. | >60% |
| Layer 3 trigger rate | What percentage of PRs trigger the expensive deep review? Too low means you are missing risk. Too high means you are overspending. | 30-50% of PRs |
| Cost per PR | Total AI cost per pull request across all layers. Validates budget assumptions. | $0.05-2.00 depending on risk level |
| TORS | Test Oracle Reliability Score: what percentage of test failures are real. If TORS is low, your agents and developers are wasting time on flaky tests. | >85% |
📦 Full metrics framework
For the complete per-layer metrics, end-to-end SDLC integration, and feedback mechanism, see the Metrics Framework reference.
Known risks
The following risks are inherent to any AI-assisted review system. VCR mitigates each through its layered architecture, but you should be aware of them when evaluating the system.
| Risk | Mitigation in VCR |
|---|---|
| LLM hallucination (false findings) | Layer 1 is fully deterministic. Layer 2 has confidence thresholds. Layer 3 findings require concrete file/line references. |
| Prompt injection via PR content | Layer 1 cannot be injected (no AI). AI layers use structured prompts with diff isolation. |
| Over-reliance on AI review | VCR explicitly directs human reviewers to focus areas. It supplements, not replaces. |
| Cost runaway on high-risk PRs | Daily budget caps. Risk-based gating. Layer 3 only triggers for MEDIUM+ risk. |
| Circular Test Trap (AI tests verify AI code) | Layer 2 detects AI-generated code. Layer 3 Test Quality lens identifies circular tests. |
| Flaky test noise (Lying Oracle) | TORS filters unreliable tests from feedback signal. Agents do not iterate on flaky failures. |
| Convention drift across teams | Proactive Scanner detects diverging patterns weekly. Conventions lens enforces in PR flow. |
| Model degradation over time | Feedback mechanism (developer reactions) detects declining finding quality. Model selection is configurable. |
📦 Detailed risk analysis
For the full risk analysis including mitigation strategies and monitoring guidance, see the AI Quick Scan layer reference.
Reference implementations
VCR is a process framework. VirtusLab provides reference implementations for each component, but every piece is substitutable with equivalent tooling that your organization already operates.
| Component | Reference implementation | Alternatives |
|---|---|---|
| Repository knowledge layer | ViDIA (VirtusLab, MIT) | Sourcegraph, custom DuckDB/SQLite, GitHub CODEOWNERS + scripts |
| CI infrastructure | Visdom Machine-Speed CI | Bazel + EngFlow, Nx, Turborepo, Gradle remote cache |
| SAST | Semgrep (open source) | CodeQL, SonarQube, Snyk Code |
| Secret scanning | gitleaks (open source) | truffleHog, GitHub secret scanning |
| AI provider | Anthropic (Claude Haiku/Sonnet/Opus) | OpenAI GPT-4o, Azure OpenAI, Google Gemini |
| CI/CD platform | GitHub Actions | GitLab CI, Azure Pipelines, Jenkins |
📦 Full reference implementations
For detailed component descriptions and integration guidance, see the Reference Implementations page.