Back to Architecture
Layer 2AI-PoweredRisk Classifier

Layer 2: AI Quick Scan

Fast AI pass over the diff. Risk classification + quick findings + AI-code detection.

Layer 2 is the first AI contact with the code. It has two goals: (1) fast feedback on obvious problems, and (2) risk classification of the PR to determine whether the expensive Layer 3 runs.

Input

Prompt Structure

You are a code reviewer. Review this diff.

## Context
- PR: {title}, author: {author}
- Affected paths: {paths} (classification: {critical/sensitive/standard})
- Layer 1 findings: {summary}
- Repository knowledge: {ownership, recent changes, module stability}

## Tasks
1. RISK CLASSIFICATION: Assess PR risk (LOW/MEDIUM/HIGH/CRITICAL)
   Signals: size, affected paths, complexity, test coverage delta, module stability
2. QUICK FINDINGS: Report obvious problems (max 5):
   - Obvious bugs, missing error handling
   - Missing tests for new code paths
   - Copy-paste / dead code
   - DO NOT comment on: naming style, missing docs, import order, formatting
     (these are handled by Layer 1 linters)
3. AI-CODE DETECTION: Does this code appear AI-generated?
   Signals: over-engineering, unnecessary abstractions, generic variable names
4. CIRCULAR TEST DETECTION: Do new tests mirror implementation logic
   rather than testing against a specification?

## Output format
Respond with JSON matching the Layer 2 output schema:
- risk_classification: LOW | MEDIUM | HIGH | CRITICAL
- risk_signals: array of { signal, value, weight }
- findings: array of { severity, file, line, category, description,
                        suggestion, confidence }
- ai_generated: { detected: bool, confidence: float, signals: array }
- circular_tests: array of { test_file, test_name, reason }

Risk Classification Logic

Risk classification is primarily deterministic, with AI judgment as one input among several:

Signal Source Weight
Path classification Config (deterministic) High
Diff size Git (deterministic) Medium
Coverage delta CI (deterministic) Medium
Module stability Repository knowledge layer (deterministic) Medium
AI-generated flag Layer 2 AI detection Medium
AI complexity assessment Layer 2 AI judgment Low
CRITICAL: critical path + large diff + coverage drop
HIGH:     critical path OR (large diff + sensitive path)
MEDIUM:   sensitive path OR AI-generated flag OR coverage drop >5%
LOW:      small diff + standard/low_risk paths + coverage stable

Gate Decision

Risk Layer 3? Estimated cost
LOW Skip ~$0.01–0.05 per PR
MEDIUM Yes, standard depth ~$0.10–0.50 per PR
HIGH Yes, full depth + extra lenses ~$0.50–2.00 per PR
CRITICAL Yes, full depth + mandatory senior review flag ~$0.50–2.00 per PR

Client can override: "always run Layer 3 on all PRs" or "skip Layer 3 for docs/**".

Risk Analysis: What Can Go Wrong

Layer 2 is the hinge of the system. It determines cost, depth, and trust. The following risks have been researched and mitigated in the design.

Risk 1: Non-determinism

⚠️ Risk: Non-determinism

LLMs produce different outputs for identical inputs, even at temperature=0. The same PR reviewed twice may receive different risk scores.

Evidence: Research measuring consistency across 5 identical runs found Claude Sonnet at 0.85 correlation, GPT-4o at 0.79. Subjective assessments (e.g., "maintainability") dropped to 0.53 correlation.

Mitigation: Risk classification uses deterministic signals as primary inputs (path classification, diff size, coverage delta). AI judgment is one signal among several, not the sole decider. For borderline cases near risk thresholds, optional consensus (2–3 runs, majority vote).

Risk 2: Cry Wolf Effect, Developer Alert Fatigue

⚠️ Risk: Cry Wolf Effect

Too many comments lead developers to auto-dismiss everything, including real findings.

Evidence: CodeRabbit produces 8–20 comments per PR. After ~10 days, teammates auto-dismissed all of them. GitHub Copilot intentionally limits to 2–5 comments with 71% actionable rate and stays silent in 29% of cases. Industry rule: <30–40% action rate = noise.

Mitigation:

Risk 3: Large Diff Degradation

⚠️ Risk: Large Diff Degradation

AI accuracy degrades on large PRs (>500 changed lines). The "lost in the middle" phenomenon means content at the beginning and end of context gets 85–95% accuracy, while the middle drops to 76–82%.

Evidence: Models with claimed 200K token context become unreliable around 130K tokens, with sudden performance drops.

Mitigation:

Risk 4: Prompt Injection via PR Content

⚠️ Risk: Prompt Injection

Malicious or "creative" code comments, PR descriptions, or commit messages can contain instructions that manipulate the AI reviewer.

Evidence: Anthropic's own Claude Code Security Review action warns it is "not hardened against prompt injection." OWASP ranks prompt injection as #1 risk for LLMs. Every file, comment, and PR description is a potential injection surface.

Mitigation:

Risk 5: Cost Explosion at Scale

⚠️ Risk: Cost Explosion

Enterprise with 200 developers at 1–2 PRs/day = 200–400 PRs/day. Deep review at $0.50–2.00/PR scales to $2,000–16,000/month.

Evidence: Claude Code Review averages $15–25 per PR (full agentic review). At 100 PRs/day, monthly cost reaches $45,000–75,000.

Mitigation:

Risk 6: Generic / Surface-Level Feedback

⚠️ Risk: Generic Feedback

Without repo context, AI falls back to generic "code reviewer" mode, commenting on naming, suggesting docstrings, flagging missing type hints. Nothing a senior would not see in 5 seconds.

Evidence: Augment Code found early versions using "pattern-based grep-search" for context produced generic findings. Quality improved only after semantic retrieval + organizational context.

Mitigation:

Risk 7: AI Reviewing AI, Blind Spots

⚠️ Risk: AI Reviewing AI

AI-generated code has patterns that another AI model may not catch because it produces similar patterns itself. Over-engineering, unnecessary abstractions, and hallucinated APIs look "clean" to an AI reviewer.

Evidence: Veracode 2025 tested 100+ LLMs: 45% of AI-generated code contains OWASP vulnerabilities. Models' own tests caught none of them. Spotify's LLM-as-judge vetoes 25% of agent output, meaning 1 in 4 passes CI but is wrong.

Mitigation:

Risk 8: Cross-Cultural Interpretation

⚠️ Risk: Cross-Cultural Interpretation

"This code needs refactoring" lands differently for a senior in Krakow, a mid in London, and a junior in Bangalore. AI comments without cultural sensitivity can be demotivating or ignored.

Evidence: Shopify research found that feedback must be constructive, not evaluative, for distributed teams.

Mitigation:

Risk Priority Matrix

# Risk Severity Likelihood Primary mitigation
2 Cry Wolf CRITICAL HIGH Hard cap, confidence threshold, silence is OK
6 Generic feedback HIGH HIGH Repo context, anti-patterns in prompts, min severity
1 Non-determinism HIGH HIGH Deterministic signals primary in risk classifier
7 AI reviewing AI HIGH MEDIUM Dedicated AI-Code Safety lens
3 Large diff degradation HIGH MEDIUM Chunk per-file, selective context, PR size warning
4 Prompt injection CRITICAL LOW–MED Input sanitization, Layer 1 backstop
5 Cost explosion MEDIUM MEDIUM Layered cost model, budget caps
8 Cross-cultural MEDIUM MEDIUM Structured format, configurable tone

⚠️ Most Dangerous Risks

Risks #2 (Cry Wolf) and #6 (Generic feedback) are the most dangerous. They lead to abandonment. If developers stop reading VCR comments, all other mitigations are irrelevant.