Back to Guide
GuidePlatform

For Platform Engineers

How to evaluate, pilot, and operate Visdom Code Review on your infrastructure.

Architecture in 60 seconds

Every pull request passes through layers of increasing depth and cost. A risk classifier at Layer 2 gates whether the expensive Layer 3 runs. The Proactive Scanner operates independently on a cron schedule.

Layer What it does Time AI?
Layer 0: Context Collection Collects diff, metadata, coverage, file classifications, repo knowledge, test reliability data <10s No
Layer 1: Deterministic Gate Linters, SAST, secret scan, coverage delta, TORS filtering. Cannot be prompt-injected. <60s No
Layer 2: AI Quick Scan Fast AI pass over diff. Risk classification (LOW→CRITICAL). Max 5 quick findings. AI-code detection. <2 min Yes (Haiku-class)
Layer 3: AI Deep Review Full analysis with repo context, history, conventions. Multiple review lenses in parallel. MEDIUM+ risk only. <10 min Yes (Sonnet/Opus-class)
Reporter Aggregates all layers into structured PR comment, inline comments, GitHub Check, optional Slack. <30s No
Proactive Scanner Cron-based repo analysis: coverage trends, tech debt, convention drift, security baseline. Scheduled Yes

📦 Full architecture reference

For the complete layer diagram, data flows, and output schemas, see the Architecture Reference.

What you need before starting

The following prerequisites are required for a pilot deployment. The first reference implementation targets GitHub; other platforms follow the same process with different adapters.

Prerequisite Details Required?
GitHub repository with PRs The v1 reference implementation is GitHub-only. Active PR flow needed for meaningful pilot data. Yes
CI pipeline with test coverage reports VCR reads coverage deltas to assess risk. Any format supported by your coverage tool. Yes
AI API key Anthropic (default). OpenAI and Azure OpenAI are configurable alternatives. Yes
30 days of test history Required for TORS (Test Oracle Reliability Score) bootstrap. Without it, start with TORS disabled and build up data over the first month. Recommended
.vcr/ directory in repo root Contains vcr-config.yaml, convention docs, review lenses, and custom rules. Yes (created during setup)

No test history?

You can start without TORS and build up reliability data during the pilot. Set layer1.tors.enabled: false in your config, then enable it after 30 days of CI data have been collected.

Running a pilot

A pilot typically runs on 1-2 teams over 4-6 weeks. The steps below assume GitHub Actions as the CI platform.

Step 1: Install

Add the VCR GitHub Actions workflow and configuration directory to your repository:

Step 2: Configure path classifications

Define which paths in your repository are critical, sensitive, standard, or low_risk. This determines review depth and cost.

Step 3: Start conservative, Layer 2 only

For the first week, run Layer 2 (AI Quick Scan) only. Observe findings, check false positive rates, and calibrate risk classification against your team's expectations. Layer 3 stays disabled.

Step 4: Enable Layer 3

After the first week, enable Layer 3 for MEDIUM-risk and above. Monitor finding quality, acceptance rates, and cost. Tune risk thresholds based on actual data.

Step 5: Add custom lenses

If your domain has specific review needs (compliance, regulatory, domain-specific patterns), add custom review lenses under .vcr/lenses/custom/. Each lens is a prompt template that defines what the AI looks for.

Step 6: Enable the Proactive Scanner

Set up a weekly cron job for convention drift detection, coverage trends, and security baseline scanning. This runs independently of the PR flow and creates GitHub Issues for critical findings.

Key configuration decisions

The following choices have the most impact on VCR's effectiveness and cost. Get these right during the pilot and adjust based on data.

Decision Options Guidance
Path classification critical, sensitive, standard, low_risk Start with auth, payments, and infrastructure as critical. API layers as sensitive. Everything else as standard. Docs and tests as low_risk.
Risk override rules Per-path overrides in layer2.risk_overrides Use always_high for auth paths. Use always_skip_layer3 for docs and markdown.
Model selection Configurable per layer Haiku-class for Layer 2 (fast, cheap). Sonnet-class for Layer 3 (balanced). Opus-class for CRITICAL PRs only (most capable, most expensive).
Budget caps budget.max_daily_layer3_spend Set a daily cap during pilot (e.g., $50/day). Prevents runaway costs if risk classification is miscalibrated.
Tone direct, constructive, educational constructive is the default and works for most teams. Use educational for teams with many junior developers. Use direct for experienced teams that prefer brevity.

📦 Full configuration reference

For the complete vcr-config.yaml schema, repository structure, and all configurable options, see the Configuration Reference.

Metrics to set up

Track these five metrics from day one of the pilot. They provide the minimum signal needed to evaluate whether VCR is working and where to tune.

Metric Why it matters for a pilot Target
Time to first comment Measures whether developers get feedback before context-switching. The primary developer experience metric. <5 min (Layer 2 only), <15 min (Layer 2 + Layer 3)
Finding acceptance rate Are VCR's findings useful? Low acceptance means prompts or risk classification need tuning. >60%
Layer 3 trigger rate What percentage of PRs trigger the expensive deep review? Too low means you are missing risk. Too high means you are overspending. 30-50% of PRs
Cost per PR Total AI cost per pull request across all layers. Validates budget assumptions. $0.05-2.00 depending on risk level
TORS Test Oracle Reliability Score: what percentage of test failures are real. If TORS is low, your agents and developers are wasting time on flaky tests. >85%

📦 Full metrics framework

For the complete per-layer metrics, end-to-end SDLC integration, and feedback mechanism, see the Metrics Framework reference.

Known risks

The following risks are inherent to any AI-assisted review system. VCR mitigates each through its layered architecture, but you should be aware of them when evaluating the system.

Risk Mitigation in VCR
LLM hallucination (false findings) Layer 1 is fully deterministic. Layer 2 has confidence thresholds. Layer 3 findings require concrete file/line references.
Prompt injection via PR content Layer 1 cannot be injected (no AI). AI layers use structured prompts with diff isolation.
Over-reliance on AI review VCR explicitly directs human reviewers to focus areas. It supplements, not replaces.
Cost runaway on high-risk PRs Daily budget caps. Risk-based gating. Layer 3 only triggers for MEDIUM+ risk.
Circular Test Trap (AI tests verify AI code) Layer 2 detects AI-generated code. Layer 3 Test Quality lens identifies circular tests.
Flaky test noise (Lying Oracle) TORS filters unreliable tests from feedback signal. Agents do not iterate on flaky failures.
Convention drift across teams Proactive Scanner detects diverging patterns weekly. Conventions lens enforces in PR flow.
Model degradation over time Feedback mechanism (developer reactions) detects declining finding quality. Model selection is configurable.

📦 Detailed risk analysis

For the full risk analysis including mitigation strategies and monitoring guidance, see the AI Quick Scan layer reference.

Reference implementations

VCR is a process framework. VirtusLab provides reference implementations for each component, but every piece is substitutable with equivalent tooling that your organization already operates.

Component Reference implementation Alternatives
Repository knowledge layer ViDIA (VirtusLab, MIT) Sourcegraph, custom DuckDB/SQLite, GitHub CODEOWNERS + scripts
CI infrastructure Visdom Machine-Speed CI Bazel + EngFlow, Nx, Turborepo, Gradle remote cache
SAST Semgrep (open source) CodeQL, SonarQube, Snyk Code
Secret scanning gitleaks (open source) truffleHog, GitHub secret scanning
AI provider Anthropic (Claude Haiku/Sonnet/Opus) OpenAI GPT-4o, Azure OpenAI, Google Gemini
CI/CD platform GitHub Actions GitLab CI, Azure Pipelines, Jenkins

📦 Full reference implementations

For detailed component descriptions and integration guidance, see the Reference Implementations page.