Visdom
Code Review

A multi-layered review process for enterprise teams shipping AI-generated code. Part of the Visdom AI-Native SDLC.

Part of Visdom · VirtusLab's AI-Native SDLC

L0 Triage
L1 Static
L2 Semantic
L3 Deep Analysis
R Reporter
How it works

Layered review

Every PR passes through layers of increasing depth. Fast and cheap for trivial changes, thorough for risky ones. A LOW-risk PR gets feedback in under 2 minutes at ~$0.05.

Full architecture reference →
01 L0 Triage

Risk scoring, routing. Instant.

02 L1 Static

Linters, SAST, pattern checks. <30s.

03 L2 Semantic

LLM review with full context. <2min.

04 L3 Deep

Multi-pass analysis, security, arch. <5min.

Risk-based routing

Risk classification

Each PR gets a risk level based on path classification, diff size, coverage delta, and module stability. Only MEDIUM+ risk triggers deep analysis.

See before/after scenarios →
LOW

Config, docs, deps. Auto-approved or light scan.

MEDIUM

Business logic. Standard LLM review with context.

HIGH

Security-sensitive, cross-service. Multi-pass analysis.

CRITICAL

Auth, payments, data migration. Full depth + human gate.

Repository context

Context sources

Each review is fed pre-indexed knowledge about the codebase: ownership, dependencies, commit history, conventions, and test reliability data.

Explore ViDIA context engine →

Context sources

ViDIA Git Blame Coverage CODEOWNERS PR History Commit Heatmap + Your Source
What it catches

AI-code patterns

Patterns specific to AI-generated code that conventional CI and human reviewers typically miss. Each is a dedicated Review Lens in Layer 3.

See real examples →
01
Circular Tests

Tests that mirror implementation instead of verifying behavior.

02
Hallucinated APIs

Calls to methods or endpoints that don't exist in your codebase.

03
Convention Drift

AI-generated code that ignores your team's established patterns.

04
Over-engineering

Unnecessary Factory patterns, abstractions, and complexity.

SEE IT IN ACTION

Interactive demo — real PRs, real findings

VCR reviews its own codebase on every pull request. Trace the triage flow, see what each layer catches, and follow findings back to the GitHub PR.

visdom-code-review · demo
VCR triage demo walkthrough

The problem in numbers

Your CI says the code is fine. Your tests pass. But the AI wrote the tests too.

84%
of CI test failures are flaky, not real regressions
Google
45%
of AI-generated code contains OWASP security vulnerabilities
Veracode 2025
12.5%
of agent output that passes CI is still functionally wrong
Spotify

Part of Visdom

VCR is one of four components in Visdom, VirtusLab's AI-Native SDLC.

Read the thinking behind it: The AI-Native SDLC series

What your team is saying right now

These are real patterns we see across enterprise teams adopting AI-assisted development.

Your Engineering ManagerKatja's problem

"My seniors in Kraków spend half their day reviewing PRs from the India team. The quality is inconsistent, feedback takes 24 hours, and we're shipping AI-generated code without really understanding what it does."

Your Platform EngineerEwa's problem

"Our agents burn $30 per task because they loop on flaky tests. The CI lies to them. And nobody's tracking the real cost. Leadership thinks AI costs us $950/month in Copilot licenses."

Your Senior DeveloperRajesh's problem

"Copilot-generated PRs look clean but they're over-engineered: unnecessary Factory patterns, hallucinated APIs, tests that mirror the implementation instead of verifying it. I catch this stuff, but I can't review everything."

Go deeper

Reference material, architecture docs, and real-world scenarios.

From the field

External research and real-world evidence that shaped this process.

Common questions

See all 10 questions →

AI code review: state of the market

Two independent benchmarks now measure what AI review tools actually catch on real pull requests. The data below is drawn from published, reproducible evaluations — not vendor self-reports.

Tool Precision Recall F1 Source
Propel 68% 61% 64% Propel Benchmark
Cubic 56% 69% 62% Martian Bench
Qodo 57% 60% Martian Bench
Augment 65% 55% 59% Propel Benchmark
CodeRabbit 36–48% 43–55% 39–51% Martian + Propel
Baz #1 on Martian ~50% Martian Bench
Claude Code 23% 51% 31% Propel Benchmark
GitHub Copilot 20% 34% 25% Propel Benchmark

About these benchmarks

Martian Code Review Bench 50 curated PRs + 200k online PRs across Sentry, Grafana, Cal.com, Discourse, Keycloak. Human-verified golden comments. LLM-as-judge (Claude/GPT). MIT licensed, open source. Created by researchers from DeepMind, Anthropic, and Meta.
Propel Benchmark 50 PRs from production open-source repos. Externally authored — Propel did not influence repo selection, PR selection, or labeling. Tools tested with default settings, no customization.

F1 50–65% is current state of the art. No tool exceeds 70% F1 on real-world PRs. Precision measures how often a tool's comments lead to a code change. Recall measures how many real issues the tool catches.

A different approach

A review process you deploy into your platform

The tools above are SaaS products you subscribe to. VCR is a review process with a reference implementation — designed to be deployed into your existing CI/CD, your infrastructure, your LLM provider. It runs in your environment, reports to your systems, and follows your rules.

VirtusLab deploys VCR as part of Visdom engagements: we configure the pipeline for your stack, tune the lenses to your conventions, and transfer ownership to your platform team. You keep the process. No ongoing SaaS dependency.

Evaluation methodology →

Deployment model

SaaS review tools VCR (deployed)
Where it runs Vendor cloud Your CI/CD pipeline
LLM provider Vendor-chosen Your choice (Claude, GPT, self-hosted)
Code leaves your network Yes No — runs on your infra
Custom review rules Limited config Full lens customization (compliance, domain)
Cost model Per-seat: $12–40/dev/mo Your LLM costs only ($0–0.44/PR)
Ownership Vendor dependency Your team owns and operates
Try it yourself

Runnable demo

Clone the repo and run the demo locally. It creates a deliberately flawed PR (auth service with 12 passing tests and 94% coverage) and runs the full 4-layer VCR pipeline on it. No API key needed — cached responses included.

Demo documentation →

Quick start

git clone https://github.com/VirtusLab/visdom-code-review
cd visdom-code-review/demo
npm install

# Narrated walkthrough (auto-paced)
npm run demo:narrate

# Interactive (press Enter to advance)
npm run demo:interactive

# Fast run (no narration)
npm run demo:local
Traditional review 0 findings · 24-48h · ~1h senior eng
VCR review 14 findings · 2 min · $0.44

Read the
full reference

Architecture, configuration, metrics framework, and reference implementations.