Architecture | Visdom Code Review

Mental Model

Every PR passes through layers of increasing depth and cost. Each layer produces structured output consumed by subsequent layers and the final report. A risk classifier at Layer 2 gates whether the expensive Layer 3 runs.

Separately, a Proactive Scanner runs on cron, analyzing the repository independent of PR flow.

Layer Diagram

┌──────────────────────────────────────────────────────────┐
│                    PR Opened / Updated                    │
└──────────────┬───────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────┐
│  LAYER 0: Context Collection                  (<10s)     │
│  Deterministic. Diff, metadata, repo knowledge, TORS.    │
└──────────────┬───────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────┐
│  LAYER 1: Deterministic Gate                  (<60s)     │
│  Zero AI. Linters, SAST, secret scan, coverage delta.    │
│  100% repeatable. Cannot be prompt-injected.             │
└──────────────┬───────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────┐
│  LAYER 2: AI Quick Scan + Risk Classifier    (<2 min)    │
│  Fast AI pass over diff. Risk: LOW→CRITICAL.             │
│  Quick findings (max 5). AI-code detection.              │
└──────────────┬───────────────────────────────────────────┘
               │ (MEDIUM+ risk only)
               ▼
┌──────────────────────────────────────────────────────────┐
│  LAYER 3: AI Deep Review                     (<10 min)   │
│  Full analysis with repo context, history, conventions.  │
│  Multiple Review Lenses run in parallel.                 │
└──────────────┬───────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────────────────────────────────┐
│  REPORTER: Aggregation + PR Comment                      │
│  Structured summary, inline comments, reviewer guidance. │
└──────────────────────────────────────────────────────────┘

Layer Summary

Layer	Type	Time	Cost/PR	Purpose
L0	Deterministic	<10s	~$0	Gather context: diff, ownership, dependencies, test reliability
L1	Deterministic	<60s	~$0	Static analysis, secrets, SAST, coverage. Immune to prompt injection
L2	AI-powered	<2 min	$0.01-0.05	Risk classification, quick findings, AI-code detection
L3	AI-powered	<10 min	$0.10-2.00	Deep review with Review Lenses. Only for MEDIUM+ risk

Key Design Decisions

Risk-Based Gating

The risk classifier at Layer 2 is the economic hinge of the system. It determines whether a PR gets the expensive Layer 3 analysis or completes at Layer 2. The target: 30-50% of PRs trigger Layer 3.

Risk Level	Layer 3?	Estimated Cost
LOW	Skip	~$0.01-0.05
MEDIUM	Yes, standard depth	~$0.10-0.50
HIGH	Yes, full depth + extra lenses	~$0.50-2.00
CRITICAL	Yes, full depth + mandatory senior review flag	~$0.50-2.00

✅ Deterministic signals first

Risk classification is primarily deterministic: path classification, diff size, coverage delta, module stability. AI judgment is one input among several, not the sole decider. This mitigates the non-determinism problem (same PR, different risk score on re-run).

Deterministic Backstop

Layer 1 is immune to prompt injection, hallucination, and non-determinism. Even if an attacker manipulates the AI layers via code comments or PR descriptions, Layer 1's secret scanning and SAST analysis still catches deterministic patterns. It is the floor, the minimum guarantee.

Precision Over Recall

VCR is tuned for precision, not recall. Better to miss a LOW-severity finding than report a false positive. The Cry Wolf effect, developers ignoring all AI comments after too many false positives, is the single biggest risk to adoption.

Layer 2: max 5 findings, confidence threshold 0.8
Layer 3: max 15 inline comments per PR
If nothing important found: ✅ VCR: No issues found (risk: LOW)

TORS: Test Oracle Reliability Score

VCR measures which tests are reliable and which are flaky. Flaky tests are excluded from the feedback signal sent to agents and AI layers. This prevents the Lying Oracle problem: agents don't "fix" tests that aren't broken.

TORS = (real failures) / (total failures)
If TORS < threshold → test excluded from feedback signal

Proactive Scanner

Independent of PR flow. Runs on cron (daily/weekly). Scans the repository for:

Coverage trends: per-module test coverage over time
Tech debt: large files, circular dependencies, growing complexity
Convention drift: does new code diverge from established patterns? Cross-team comparison
Security baseline: full SAST scan, dependency vulnerabilities
AI-code audit: which modules have high AI-generated code density

Particularly valuable for mixed teams across geographies, where convention drift can accumulate unnoticed for months.

Read more about the Proactive Scanner →

Visdom SDLC Integration

VCR connects to the broader Visdom AI-Native SDLC metrics framework:

Metric	What it measures	VCR's role
ITS (Iterations-to-Success)	Iterations from task to passing CI	Reduces ITS by filtering flaky tests and providing early feedback
CPI (Cost-per-Iteration)	Tokens + compute + CI + review per iteration	Reduces review component; TORS reduces wasted iterations
TORS (Test Oracle Reliability Score)	% of test failures that are real regressions	Directly measured by Layer 1; feeds risk classification

Full metrics framework →