Visdom
Code Review

A multi-layered, configuration-driven review process for enterprise teams shipping AI-generated code, where review policy lives in your repo as code. Part of the Visdom AI-Native SDLC.

Part of Visdom · VirtusLab's AI-Native SDLC

The problem in numbers

Your CI says the code is fine. Your tests pass. But the AI wrote the tests too.

84%
of CI test failures are flaky, not real regressions
Google
45%
of AI-generated code contains OWASP security vulnerabilities
Veracode 2025
12.5%
of agent output that passes CI is still functionally wrong
Spotify

Measured on real pull requests

One logged end-to-end run, three measurements. Numbers from logged runs, not projections.

~22 s
average end-to-end review time per pull request
llama3 v8 run, 2026-06-10
75% F1
at a 14:1 signal-to-noise ratio on a seeded-findings benchmark PR
perfect-pr A/B, 2026-06-10

Run context: 38 PRs · 154 findings

A different approach

A review process you deploy into your platform

VCR is not a SaaS product and not a vendor service. It's a review process with an open-source reference implementation: opinionated patterns, a defined layer sequence, and a pipeline you run inside your own CI/CD, against your own LLM provider, behind your own network boundary.

VirtusLab deploys VCR as part of Visdom engagements: we embed with your platform engineers, configure the pipeline for your stack, tune the lenses to your conventions, and hand it over. Capability transfer from day one. Your team owns and operates the process. No ongoing SaaS dependency.

Evaluation methodology →

Deployment model

SaaS review tools VCR (deployed)
Where it runs Vendor cloud Your CI/CD pipeline
LLM provider Vendor-chosen Your choice (Claude, GPT, self-hosted)
Code leaves your network Yes No, runs on your infra
Custom review rules Limited config Versioned .visdom.yaml: deterministic patterns + LLM checklists, one schema
Cost model Per-seat subscription Your LLM costs only, no per-seat fees
Ownership Vendor dependency Your team owns and operates
How it works

Layered review

Every PR passes through layers of increasing depth. Fast and cheap for trivial changes, thorough for risky ones. Measured across a 38-PR run: ~22 seconds per PR on average.

Full architecture reference →
01 L0 Triage

Risk scoring, routing. Instant.

02 L1 Static

Linters, SAST, pattern checks.

03 L2 Semantic

LLM review with full context.

04 L3 Deep

Multi-pass analysis, security, arch.

Risk-based routing

Risk classification

Each PR gets a risk level based on path classification, diff size, coverage delta, and module stability. Thresholds and routing rules live in a versioned .visdom.yaml in your repo, reviewed and changed like any other code. Only MEDIUM+ risk triggers deep analysis.

See before/after scenarios →
LOW

Config, docs, deps. Auto-approved or light scan.

MEDIUM

Business logic. Standard LLM review with context.

HIGH

Security-sensitive, cross-service. Multi-pass analysis.

CRITICAL

Auth, payments, data migration. Full depth + human gate.

Repository context

Context sources

Each review is fed pre-indexed knowledge about the codebase: ownership, dependencies, commit history, conventions, and test reliability data, plus your own standards documents, injected directly into the review context.

Explore the Context Fabric context engine

Context sources

Context Fabric Git Blame Coverage CODEOWNERS PR History Commit Heatmap + Your Standards Docs
What it catches

AI-code patterns

Patterns specific to AI-generated code that conventional CI and human reviewers typically miss. Each is a dedicated Review Lens in Layer 3. The same schema covers your own org rules: deterministic patterns and LLM checklists, side by side.

See real examples →
01
Circular Tests

Tests that mirror implementation instead of verifying behavior.

02
Hallucinated APIs

Calls to methods or endpoints that don't exist in your codebase.

03
Convention Drift

AI-generated code that ignores your team's established patterns.

04
Over-engineering

Unnecessary Factory patterns, abstractions, and complexity.

SEE IT IN ACTION

Interactive demo: real PRs, real findings

VCR reviews its own codebase on every pull request. Trace the triage flow, see what each layer catches, and follow findings back to the GitHub PR: 38 PRs, 154 findings.

vcr-grafana.fly.dev · Quality Pulse
Grafana Quality Pulse dashboard showing VCR metrics
Real output, no toys

What lands on your PR

Findings grouped by layer, linked to the exact line. Every comment names the rule, the risk level, and a concrete fix, not a vague hint.

  • Cross-layer deduplication: one finding per issue, with multi-layer agreement recorded as confirmedBy
  • Fixes posted as GitHub suggestion blocks you can approve and apply from the PR
  • SARIF 2.1.0 export for code-scanning dashboards and audit pipelines
See the real PR this came from →
github.com · Pull Request · VCR commented
VCR GitHub PR comment showing findings grouped by layer with inline code annotations

Part of Visdom

VCR is one of four components in Visdom, VirtusLab's AI-Native SDLC.

Repository context

Context Fabric

Pre-indexed code expertise, dependency graphs, PR history

Machine-Speed CI

Fast Feedback

Short CI loops, caching, incremental builds, test impact analysis

Risk Assessment

VCR

Multi-layered AI code review. You are here.

Governance

TraceVault

Audit trail, auto-evaluation, EU AI Act compliance

Read the thinking behind it: The AI-Native SDLC series

Go deeper

Reference material, architecture docs, and real-world scenarios.

Read the
full reference

Architecture, configuration, metrics framework, and reference implementations.