Fully deterministic. Zero AI. Layer 0 collects everything subsequent layers need, producing a single
review-context.json document that is the source of truth for the entire review pipeline.
Required Data
| Source | What we collect | Format |
|---|---|---|
| Git diff | Changed files, added/removed lines, hunks | Unified diff |
| PR metadata | Title, description, author, labels, linked issues, draft status | JSON |
| Test coverage | Coverage of affected files, delta vs base branch | JSON report |
| File classification | Type per file: critical / sensitive / standard / low_risk (from config) | Tags |
| Repo conventions | Linter configs, CODEOWNERS, architecture docs, convention files | Raw |
| Repository knowledge layer | Code ownership, dependency graph, PR history, commit heatmap, expertise scores | Structured query results |
| Test reliability data | Known flaky tests, per-test pass/fail history | JSON (TORS input) |
Repository Knowledge Layer
Layer 0 requires access to a pre-indexed repository knowledge layer, a deterministic data source that provides code ownership, dependency graphs, commit analytics, and PR history without re-parsing the repository on every run.
This layer must be:
- Deterministic: same commit = same data
- Pre-indexed: queries complete in seconds, not minutes
- Reusable: shared across agents and review runs, not rebuilt per session
📦 Reference Implementation
Reference implementation: ViDIA (VirtusLab, MIT license), DuckDB analytics over git history, dependency graphs, and PR discussions, served as MCP tools or CLI. Pinned by SHA256, reusable across sessions.
Alternative implementations: any system that exposes the required data (ownership, dependencies, history, expertise) via API or CLI. Examples: GitHub CODEOWNERS + custom scripts, Sourcegraph code intelligence, custom DuckDB/SQLite indexes over git log.
Test Reliability Data (TORS Input)
Layer 0 also collects test reliability history, per-test pass/fail data used to compute the Test Oracle Reliability Score (TORS). This data feeds into Layer 1 and Layer 2 to filter flaky test signals from agent feedback.
Sources: CI historical data, test result databases, flaky test tracking tools.
Path Classification
Path classification is client-configured. Each file in the diff is tagged by the first matching rule:
path_classifications:
critical:
- "src/auth/**"
- "src/payments/**"
- "infra/**"
- "*.tf"
sensitive:
- "src/api/**"
- "src/middleware/**"
standard:
- "src/**"
low_risk:
- "docs/**"
- "*.md"
- "test/**" Output
Layer 0 produces review-context.json, a structured document consumed by all subsequent layers.
This is the single source of truth for the review.