You've seen bots on PRs before. They post 40 comments, half are wrong, and you ignore all of them by the second week. VCR is built to be the opposite of that. Here's what actually changes in your day-to-day.
What you'll see on your PRs
VCR posts inline comments directly on the relevant lines of your diff. Each comment carries structured metadata so you can decide immediately how much weight to give it.
The example below is illustrative — it shows the anatomy of a v8 inline comment, not actual tool output.
VCR · HIGH · Security · Confidence: high · CWE-89 · OWASP A03 · confirmed by layers 1+3
SQL query built by string concatenation with unsanitized user input. An attacker can manipulate the query to read or delete arbitrary rows.
Suggestion: Use a parameterized query instead of string concatenation.
```suggestion - const query = "SELECT * FROM users WHERE id = " + userId; + const query = "SELECT * FROM users WHERE id = ?"; + db.execute(query, [userId]); ```
Each finding includes:
- Severity — HIGH, MEDIUM, or LOW, matching the risk level of the change
- Category — which lens flagged it (security, correctness, test-quality, performance, maintainability)
- Confidence bucket — high (≥0.8), medium (≥0.5), or requires verification (below 0.5)
- CWE / OWASP reference — included when the rule carries that metadata
- Cross-layer confirmation — "confirmed by layers 1+3" means two independent analysis passes found the same issue; these are the most reliable findings
- Suggestion prose — a plain-language description of what to fix and why
- One-click GitHub suggestion block — when the engine can produce an exact patch, it renders as a GitHub
suggestionblock you can apply directly from the PR review UI - What to do with each tier: high confidence + cross-layer confirmed — address before merge; medium — use your judgment; low ("requires verification") — treat as a question for the author or reviewer, not a verdict.
✅ Silence is the default
Most PRs are fine. VCR is designed to stay quiet when there's nothing worth saying. If you're not hearing from it, that's working as intended.
How the risk levels work
Every PR gets a risk level. This determines how deep VCR looks and how much of your (and your reviewer's) time it asks for.
| Risk | What triggers it | What happens |
|---|---|---|
| LOW | Small change, safe paths, tests pass | Fast scan only, done in ~2 min. No deep review. |
| MEDIUM | Sensitive path or coverage drop | Deep review kicks in. More thorough analysis. |
| HIGH | Critical path (auth, payments, infra) | Full multi-lens analysis. ~10 min. |
| CRITICAL | All the above + AI-generated code on a critical path | Full analysis + your senior gets pinged directly. |
The mental model is simple: if you touch auth, expect CRITICAL. If you touch docs, expect LOW. That's it.
What VCR catches that you might miss
VCR reviews code through five lenses. Each lens is focused on a distinct failure mode that tends to slip through human review, especially on AI-assisted code.
Security
Injection flaws, authentication gaps, insecure deserialization, missing authorization checks. VCR maps findings to CWE and OWASP references where the rule carries that metadata, so you can look up the full vulnerability class rather than just reading a one-line description.
Correctness
Logic that compiles and passes type checks but is wrong at runtime. One example caught in the v8 llama3 run: Double-checked locking pattern without proper volatile semantics — a concurrency bug that's invisible to the type system and produces intermittent failures under load.
Test quality
Circular tests pass but prove nothing. Copilot generated both the code and the test, so the test is a mirror of the implementation. VCR flags this explicitly: "This test verifies implementation, not specification." It also checks for missing coverage on new entry points and tests that assert on internal state rather than observable behavior.
Performance
N+1 queries, blocking I/O in async paths, accidentally-quadratic patterns, and unbounded caches. This lens is on by default at medium severity and above. In the llama3 run the lens flagged, for example: Shared mutable `rowBuf`/`vecBuf` fields cause data corruption under concurrent inference — a finding that sits on the performance/correctness boundary, since the shared mutable state corrupts results under concurrent load rather than merely slowing things down; the lens overlap is real.
Maintainability (opt-in)
Duplication, God objects, and dead code. This lens is off by default — enable it in
.visdom.yaml (lenses: maintainability: { enabled: true }) if your team wants it.
When on, it flags patterns like three wrapper classes for one function and tells you: "This could be a single function call."
💡 It's not a linter
VCR doesn't care about your semicolons, import order, or variable names. That's what ESLint, Prettier, and your IDE are for. VCR looks at the stuff that actually causes incidents.
How to give feedback
VCR learns from your reactions. Every finding has reaction buttons on the PR comment. Use them. It takes one click and directly shapes what VCR comments on next time.
| Reaction | Meaning | What happens |
|---|---|---|
| 👍 | Helpful, you fixed it | VCR gains confidence in this category for your codebase |
| 👎 | False positive, not relevant | VCR reduces weight for this pattern in your context |
| 🤔 | Not sure, needs discussion | Flagged for team review, helps calibrate edge cases |
✅ Your feedback matters
Thumbs-down a bad finding and VCR learns. The more your team reacts, the fewer false positives you'll see over time. This is how VCR avoids becoming another bot you ignore.
What VCR does NOT do
Let's be clear about the boundaries so you don't have wrong expectations.
- Won't auto-fix your code. v8 reports findings and can render one-click GitHub suggestion blocks where an exact patch is possible, but you apply the change — VCR never commits to your branch.
- Won't replace human review. Your senior still approves the PR. VCR tells them where to look, not what to decide.
- Won't block your PR unless your team explicitly configures it to. By default, VCR advises.
- Won't comment on formatting, naming, or import order. That's what your linter is for. VCR focuses on things that cause actual problems.
- Low-confidence findings are labeled "requires verification" — treat them as questions worth investigating, not as verdicts. High-confidence findings and cross-layer confirmations are where you should focus first.
For senior developers
If you're the person who approves PRs and sets standards for your team, here's what changes for you specifically.
You'll review fewer PRs, but with better context. VCR pre-annotates every PR with risk level, specific findings, and exactly which files need your attention. Instead of reading every line of a 400-line diff, you focus on the 3 files that matter.
VCR tells you WHERE to look, not what to decide. It's a guide, not a replacement. "Focus your review on auth.ts (security findings) and test coverage gap." That's the kind of guidance you get.
Customization you'll care about
- Org rules: add domain-specific checks as rule files in
.visdom/rules/. If your team has specific patterns for database migrations or event sourcing, you can teach VCR to check for them. - Standards and instructions: point
standards.sourcesin.visdom.yamlat your existing docs and steer the review withinstructions. "We use direct instantiation, not Factory pattern." That kind of thing. VCR will flag deviations.
Full details on lenses and configuration: Configuration Reference
What developers are saying
VCR taught me about circular tests. I didn't know my tests were just mirrors of my implementation. They passed, so I thought they were fine. Now I actually write tests that check behavior, not just that the code runs.
I review 40% fewer PRs but catch more real issues. VCR caught a hallucinated API last week that I would have missed. It looked completely correct at first glance. VCR told me exactly which line to look at and why.
| Tool | Precision | Recall | F1 | Source |
|---|---|---|---|---|
| Propel | 68% | 61% | 64% | Propel Benchmark |
| Cubic | 56% | 69% | 62% | Martian Bench |
| Qodo | — | 57% | 60% | Martian Bench |
| Augment | 65% | 55% | 59% | Propel Benchmark |
| CodeRabbit | 36–48% | 43–55% | 39–51% | Martian + Propel |
| Baz | #1 on Martian | — | ~50% | Martian Bench |
| Claude Code | 23% | 51% | 31% | Propel Benchmark |
| GitHub Copilot | 20% | 34% | 25% | Propel Benchmark |