A multi-layered review process for enterprise teams shipping AI-generated code. Part of the Visdom AI-Native SDLC.
Part of Visdom · VirtusLab's AI-Native SDLC
Every PR passes through layers of increasing depth. Fast and cheap for trivial changes, thorough for risky ones. A LOW-risk PR gets feedback in under 2 minutes at ~$0.05.
Full architecture reference →Risk scoring, routing. Instant.
Linters, SAST, pattern checks. <30s.
LLM review with full context. <2min.
Multi-pass analysis, security, arch. <5min.
Each PR gets a risk level based on path classification, diff size, coverage delta, and module stability. Only MEDIUM+ risk triggers deep analysis.
See before/after scenarios →Config, docs, deps. Auto-approved or light scan.
Business logic. Standard LLM review with context.
Security-sensitive, cross-service. Multi-pass analysis.
Auth, payments, data migration. Full depth + human gate.
Each review is fed pre-indexed knowledge about the codebase: ownership, dependencies, commit history, conventions, and test reliability data.
Explore ViDIA context engine →Context sources
Patterns specific to AI-generated code that conventional CI and human reviewers typically miss. Each is a dedicated Review Lens in Layer 3.
See real examples →Tests that mirror implementation instead of verifying behavior.
Calls to methods or endpoints that don't exist in your codebase.
AI-generated code that ignores your team's established patterns.
Unnecessary Factory patterns, abstractions, and complexity.
VCR reviews its own codebase on every pull request. Trace the triage flow, see what each layer catches, and follow findings back to the GitHub PR.
Your CI says the code is fine. Your tests pass. But the AI wrote the tests too.
VCR is one of four components in Visdom, VirtusLab's AI-Native SDLC.
Pre-indexed code expertise, dependency graphs, PR history
Sub-2-min CI loops, caching, incremental builds, test impact analysis
Multi-layered AI code review. You are here.
GovernanceAudit trail, auto-evaluation, EU AI Act compliance
Read the thinking behind it: The AI-Native SDLC series
These are real patterns we see across enterprise teams adopting AI-assisted development.
"My seniors in Kraków spend half their day reviewing PRs from the India team. The quality is inconsistent, feedback takes 24 hours, and we're shipping AI-generated code without really understanding what it does."
"Our agents burn $30 per task because they loop on flaky tests. The CI lies to them. And nobody's tracking the real cost. Leadership thinks AI costs us $950/month in Copilot licenses."
"Copilot-generated PRs look clean but they're over-engineered: unnecessary Factory patterns, hallucinated APIs, tests that mirror the implementation instead of verifying it. I catch this stuff, but I can't review everything."
Reference material, architecture docs, and real-world scenarios.
4 concrete scenarios showing what changes when VCR is deployed.
REFERENCEFull architecture, layer docs, configuration, metrics framework.
REFERENCETechnology-agnostic table of components and alternatives.
External research and real-world evidence that shaped this process.
How feedback must be constructive, not evaluative, for distributed teams adopting AI.
Shopify / arXiv
INDUSTRYPattern-based grep-search for context produces poor results. Retrieval architecture is everything.
Augment Code
RESEARCHSystematic study of security vulnerabilities in AI-generated code across common languages.
arXiv 2025
INDUSTRYLLM-as-judge vetoes 25% of agent output. 1 in 4 passes CI but is still functionally wrong.
Spotify Engineering
Two independent benchmarks now measure what AI review tools actually catch on real pull requests. The data below is drawn from published, reproducible evaluations — not vendor self-reports.
| Tool | Precision | Recall | F1 | Source |
|---|---|---|---|---|
| Propel | 68% | 61% | 64% | Propel Benchmark |
| Cubic | 56% | 69% | 62% | Martian Bench |
| Qodo | — | 57% | 60% | Martian Bench |
| Augment | 65% | 55% | 59% | Propel Benchmark |
| CodeRabbit | 36–48% | 43–55% | 39–51% | Martian + Propel |
| Baz | #1 on Martian | — | ~50% | Martian Bench |
| Claude Code | 23% | 51% | 31% | Propel Benchmark |
| GitHub Copilot | 20% | 34% | 25% | Propel Benchmark |
The tools above are SaaS products you subscribe to. VCR is a review process with a reference implementation — designed to be deployed into your existing CI/CD, your infrastructure, your LLM provider. It runs in your environment, reports to your systems, and follows your rules.
VirtusLab deploys VCR as part of Visdom engagements: we configure the pipeline for your stack, tune the lenses to your conventions, and transfer ownership to your platform team. You keep the process. No ongoing SaaS dependency.
Evaluation methodology →Deployment model
Clone the repo and run the demo locally. It creates a deliberately flawed PR (auth service with 12 passing tests and 94% coverage) and runs the full 4-layer VCR pipeline on it. No API key needed — cached responses included.
Demo documentation →Quick start
git clone https://github.com/VirtusLab/visdom-code-review
cd visdom-code-review/demo
npm install
# Narrated walkthrough (auto-paced)
npm run demo:narrate
# Interactive (press Enter to advance)
npm run demo:interactive
# Fast run (no narration)
npm run demo:local Architecture, configuration, metrics framework, and reference implementations.