Mental Model
Every PR passes through layers of increasing depth and cost. Each layer produces structured output consumed by subsequent layers and the final report. A risk classifier at Layer 2 gates whether the expensive Layer 3 runs.
Separately, a Proactive Scanner runs on cron, analyzing the repository independent of PR flow.
Layer Diagram
┌──────────────────────────────────────────────────────────┐
│ PR Opened / Updated │
└──────────────┬───────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ LAYER 0: Context Collection (<10s) │
│ Deterministic. Diff, metadata, repo knowledge, TORS. │
└──────────────┬───────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ LAYER 1: Deterministic Gate (<60s) │
│ Zero AI. Linters, SAST, secret scan, coverage delta. │
│ 100% repeatable. Cannot be prompt-injected. │
└──────────────┬───────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ LAYER 2: AI Quick Scan + Risk Classifier (<2 min) │
│ Fast AI pass over diff. Risk: LOW→CRITICAL. │
│ Quick findings (max 5). AI-code detection. │
└──────────────┬───────────────────────────────────────────┘
│ (MEDIUM+ risk only)
▼
┌──────────────────────────────────────────────────────────┐
│ LAYER 3: AI Deep Review (<10 min) │
│ Full analysis with repo context, history, conventions. │
│ Multiple Review Lenses run in parallel. │
└──────────────┬───────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ REPORTER: Aggregation + PR Comment │
│ Structured summary, inline comments, reviewer guidance. │
└──────────────────────────────────────────────────────────┘ Layer Summary
| Layer | Type | Time | Cost/PR | Purpose |
|---|---|---|---|---|
| L0 | Deterministic | <10s | ~$0 | Gather context: diff, ownership, dependencies, test reliability |
| L1 | Deterministic | <60s | ~$0 | Static analysis, secrets, SAST, coverage. Immune to prompt injection |
| L2 | AI-powered | <2 min | $0.01-0.05 | Risk classification, quick findings, AI-code detection |
| L3 | AI-powered | <10 min | $0.10-2.00 | Deep review with Review Lenses. Only for MEDIUM+ risk |
Key Design Decisions
Risk-Based Gating
The risk classifier at Layer 2 is the economic hinge of the system. It determines whether a PR gets the expensive Layer 3 analysis or completes at Layer 2. The target: 30-50% of PRs trigger Layer 3.
| Risk Level | Layer 3? | Estimated Cost |
|---|---|---|
| LOW | Skip | ~$0.01-0.05 |
| MEDIUM | Yes, standard depth | ~$0.10-0.50 |
| HIGH | Yes, full depth + extra lenses | ~$0.50-2.00 |
| CRITICAL | Yes, full depth + mandatory senior review flag | ~$0.50-2.00 |
✅ Deterministic signals first
Risk classification is primarily deterministic: path classification, diff size, coverage delta, module stability. AI judgment is one input among several, not the sole decider. This mitigates the non-determinism problem (same PR, different risk score on re-run).
Deterministic Backstop
Layer 1 is immune to prompt injection, hallucination, and non-determinism. Even if an attacker manipulates the AI layers via code comments or PR descriptions, Layer 1's secret scanning and SAST analysis still catches deterministic patterns. It is the floor, the minimum guarantee.
Precision Over Recall
VCR is tuned for precision, not recall. Better to miss a LOW-severity finding than report a false positive. The Cry Wolf effect, developers ignoring all AI comments after too many false positives, is the single biggest risk to adoption.
- Layer 2: max 5 findings, confidence threshold 0.8
- Layer 3: max 15 inline comments per PR
- If nothing important found:
✅ VCR: No issues found (risk: LOW)
TORS: Test Oracle Reliability Score
VCR measures which tests are reliable and which are flaky. Flaky tests are excluded from the feedback signal sent to agents and AI layers. This prevents the Lying Oracle problem: agents don't "fix" tests that aren't broken.
TORS = (real failures) / (total failures)
If TORS < threshold → test excluded from feedback signal Proactive Scanner
Independent of PR flow. Runs on cron (daily/weekly). Scans the repository for:
- Coverage trends: per-module test coverage over time
- Tech debt: large files, circular dependencies, growing complexity
- Convention drift: does new code diverge from established patterns? Cross-team comparison
- Security baseline: full SAST scan, dependency vulnerabilities
- AI-code audit: which modules have high AI-generated code density
Particularly valuable for mixed teams across geographies, where convention drift can accumulate unnoticed for months.
Read more about the Proactive Scanner →
Visdom SDLC Integration
VCR connects to the broader Visdom AI-Native SDLC metrics framework:
| Metric | What it measures | VCR's role |
|---|---|---|
| ITS (Iterations-to-Success) | Iterations from task to passing CI | Reduces ITS by filtering flaky tests and providing early feedback |
| CPI (Cost-per-Iteration) | Tokens + compute + CI + review per iteration | Reduces review component; TORS reduces wasted iterations |
| TORS (Test Oracle Reliability Score) | % of test failures that are real regressions | Directly measured by Layer 1; feeds risk classification |