Back to Overview
ArchitectureLayered Model

Architecture

Layered Review Agent: review depth scales with risk.

Mental Model

Every PR passes through layers of increasing depth and cost. Each layer produces structured output consumed by subsequent layers and the final report. A risk classifier at Layer 2 gates whether the expensive Layer 3 runs.

Separately, a Proactive Scanner runs on cron, analyzing the repository independent of PR flow.

Layer Diagram

┌──────────────────────────────────────────────────────────┐
│                    PR Opened / Updated                    │
└──────────────┬───────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────┐
│  LAYER 0: Context Collection                  (<10s)     │
│  Deterministic. Diff, metadata, repo knowledge, TORS.    │
└──────────────┬───────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────┐
│  LAYER 1: Deterministic Gate                  (<60s)     │
│  Zero AI. Linters, SAST, secret scan, coverage delta.    │
│  100% repeatable. Cannot be prompt-injected.             │
└──────────────┬───────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────┐
│  LAYER 2: AI Quick Scan + Risk Classifier    (<2 min)    │
│  Fast AI pass over diff. Risk: LOW→CRITICAL.             │
│  Quick findings (max 5). AI-code detection.              │
└──────────────┬───────────────────────────────────────────┘
               │ (MEDIUM+ risk only)
               ▼
┌──────────────────────────────────────────────────────────┐
│  LAYER 3: AI Deep Review                     (<10 min)   │
│  Full analysis with repo context, history, conventions.  │
│  Multiple Review Lenses run in parallel.                 │
└──────────────┬───────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────┐
│  REPORTER: Aggregation + PR Comment                      │
│  Structured summary, inline comments, reviewer guidance. │
└──────────────────────────────────────────────────────────┘

Layer Summary

Layer Type Time Cost/PR Purpose
L0 Deterministic <10s ~$0 Gather context: diff, ownership, dependencies, test reliability
L1 Deterministic <60s ~$0 Static analysis, secrets, SAST, coverage. Immune to prompt injection
L2 AI-powered <2 min $0.01-0.05 Risk classification, quick findings, AI-code detection
L3 AI-powered <10 min $0.10-2.00 Deep review with Review Lenses. Only for MEDIUM+ risk

Key Design Decisions

Risk-Based Gating

The risk classifier at Layer 2 is the economic hinge of the system. It determines whether a PR gets the expensive Layer 3 analysis or completes at Layer 2. The target: 30-50% of PRs trigger Layer 3.

Risk Level Layer 3? Estimated Cost
LOW Skip ~$0.01-0.05
MEDIUM Yes, standard depth ~$0.10-0.50
HIGH Yes, full depth + extra lenses ~$0.50-2.00
CRITICAL Yes, full depth + mandatory senior review flag ~$0.50-2.00

Deterministic signals first

Risk classification is primarily deterministic: path classification, diff size, coverage delta, module stability. AI judgment is one input among several, not the sole decider. This mitigates the non-determinism problem (same PR, different risk score on re-run).

Deterministic Backstop

Layer 1 is immune to prompt injection, hallucination, and non-determinism. Even if an attacker manipulates the AI layers via code comments or PR descriptions, Layer 1's secret scanning and SAST analysis still catches deterministic patterns. It is the floor, the minimum guarantee.

Precision Over Recall

VCR is tuned for precision, not recall. Better to miss a LOW-severity finding than report a false positive. The Cry Wolf effect, developers ignoring all AI comments after too many false positives, is the single biggest risk to adoption.

TORS: Test Oracle Reliability Score

VCR measures which tests are reliable and which are flaky. Flaky tests are excluded from the feedback signal sent to agents and AI layers. This prevents the Lying Oracle problem: agents don't "fix" tests that aren't broken.

TORS = (real failures) / (total failures)
If TORS < threshold → test excluded from feedback signal

Proactive Scanner

Independent of PR flow. Runs on cron (daily/weekly). Scans the repository for:

Particularly valuable for mixed teams across geographies, where convention drift can accumulate unnoticed for months.

Read more about the Proactive Scanner →

Visdom SDLC Integration

VCR connects to the broader Visdom AI-Native SDLC metrics framework:

Metric What it measures VCR's role
ITS (Iterations-to-Success) Iterations from task to passing CI Reduces ITS by filtering flaky tests and providing early feedback
CPI (Cost-per-Iteration) Tokens + compute + CI + review per iteration Reduces review component; TORS reduces wasted iterations
TORS (Test Oracle Reliability Score) % of test failures that are real regressions Directly measured by Layer 1; feeds risk classification

Full metrics framework →