VISDOM CREATED WITH VISDOM

Every PR on this repo is reviewed by VCR

We don't demo on toy examples. VCR runs on its own codebase — metacircularly — on every pull request. The findings below are real.

See last PR review on GitHub ↗
39 bugs found $0.06 avg cost 25s avg time
test: add comprehensive pipeline layer tests 5 findings $0.04 · 21s view →
refactor: simplify deterministic gate pattern matching 10 findings $0.07 · 27s view →
feat: add retry and caching to AI client 7 findings $0.06 · 21s view →

Findings by Severity

Per scenario · real run output

Cost Per Layer

L0+L1 free · L3 only runs for HIGH/CRITICAL

F1 Score vs Market

50 PRs · 5 repos · advisor judge · honest numbers

4× Hidden Tax Breakdown

Illustrative cost model — proportions vary by team

How each PR moves through the pipeline

Click a scenario to trace its path. Each layer shows what was found and why the gate made its decision.

← Select a scenario to trace its path

From PR opened to human decision

Click any scenario to see the full timeline. Click any event for details.

← Select a scenario

Each scenario is a real PR that looks fine — until VCR runs

Every PR below passed CI, had a clean description, and claimed tests were green. VCR found the hidden issues anyway.

META VCR reviewing its own codebase (metacircular) STANDALONE External real-world code
META · TypeScript

Securing the AI Client

PR: feat: add retry and caching to AI client

"All existing tests pass."

VCR found:

Hardcoded API key, PII in logs, retry without backoff

● 1 critical ● 6 high ● 3 medium
  • critical
  • medium
  • high
$0.06 21s
View PR #29 on GitHub →
META · TypeScript

Refactoring the Gate

PR: refactor: simplify deterministic gate pattern matching

"Behavior unchanged. All tests pass."

VCR found:

Weakened SQL check, timing-unsafe compare, SSRF rule disabled

● 1 critical ● 9 high ● 3 medium
  • high
  • critical
  • high
$0.07 27s
View PR #30 on GitHub →
META · TypeScript

Hollow Test Suite

PR: test: add comprehensive pipeline layer tests

"100% line coverage. All green."

VCR found:

15 tests mocking their own subjects — zero behavioral assertions

● 5 high ● 1 medium
  • high
  • high
  • medium
$0.04 21s
View PR #31 on GitHub →
STANDALONE · Python

Payment Service

PR: feat: add payment processing endpoint

"Tested against Stripe sandbox. All passing."

VCR found:

SQL injection via f-string, card data in logs, weak JWT secret

● 3 critical ● 5 high ● 2 medium
  • high
  • critical
  • critical
$0.05 32s
Local run — no GitHub PR

On a real repository

The same engine, run end-to-end on llama3-java-hat — a real Java LLM-inference project by the site's author — with tool defaults and no config in the repo.

38 PRs · 154 findings · $2.98 total ($0.078/PR) · ~22 s/PR

Deep review (L3) ran on 27 of 38 PRs — the cheap triage gate stopped the rest.

Config-as-code was then introduced in a single PR — PR #50 — and the review of that PR used the configuration the PR itself carries: a repo-specific regex rule fired (llama3/no-stdout-logging), an LLM org rule fired with a ready-to-apply fix suggestion (llama3/tensor-ops-document-shape), and a finding cited the repo's own docs/standards/error-handling.md. 4 findings · $0.0476 · 18.6 s

Actual review output:

PR #50 — demo: VISDOM config-as-code review showcase
4 files, +63/-0 lines

HIGH (2)
  • InterruptedException swallowed without re-interrupt [src/main/java/com/arturskowronski/llama3babylon/hat/TensorDiagnostics.java]
  • rowChecksum accepts float[] with no shape/dimension validation [src/main/java/com/arturskowronski/llama3babylon/hat/TensorDiagnostics.java]

MEDIUM (2)
  • System.out logging in library code [src/main/java/com/arturskowronski/llama3babylon/hat/TensorDiagnostics.java]
  • rowChecksum accepts any-length array with no dimension guard [src/main/java/com/arturskowronski/llama3babylon/hat/TensorDiagnostics.java]

Duration: 18.6s · Cost: $0.0476 · L3: yes

The full SARIF export for this review is vendored at docs/demos/llama3-v8/review.sarif for readers wiring code-scanning tooling. The review comments visible on the PR itself come from a separate run of the same engine, which found 3 of these 4 findings — the deterministic layers agree run to run; the LLM lenses do not always.

Team health — measured on every PR

Anonymous read access · updates on every PR · powered by Grafana on fly.io

VCR Grafana dashboard showing PR metrics and findings per week