Back to Architecture
Layer 0Deterministic

Layer 0: Context Collection

Deterministic context gathering from repository knowledge, git data, and test reliability history.

Fully deterministic. Zero AI. Layer 0 collects everything subsequent layers need, producing a single review-context.json document that is the source of truth for the entire review pipeline.

Required Data

Source What we collect Format
Git diff Changed files, added/removed lines, hunks Unified diff
PR metadata Title, description, author, labels, linked issues, draft status JSON
Test coverage Coverage of affected files, delta vs base branch JSON report
File classification Type per file: critical / sensitive / standard / low_risk (from config) Tags
Repo conventions Linter configs, CODEOWNERS, architecture docs, convention files Raw
Repository knowledge layer Code ownership, dependency graph, PR history, commit heatmap, expertise scores Structured query results
Test reliability data Known flaky tests, per-test pass/fail history JSON (TORS input)

Repository Knowledge Layer

Layer 0 requires access to a pre-indexed repository knowledge layer, a deterministic data source that provides code ownership, dependency graphs, commit analytics, and PR history without re-parsing the repository on every run.

This layer must be:

📦 Reference Implementation

Reference implementation: ViDIA (VirtusLab, MIT license), DuckDB analytics over git history, dependency graphs, and PR discussions, served as MCP tools or CLI. Pinned by SHA256, reusable across sessions.

Alternative implementations: any system that exposes the required data (ownership, dependencies, history, expertise) via API or CLI. Examples: GitHub CODEOWNERS + custom scripts, Sourcegraph code intelligence, custom DuckDB/SQLite indexes over git log.

Test Reliability Data (TORS Input)

Layer 0 also collects test reliability history, per-test pass/fail data used to compute the Test Oracle Reliability Score (TORS). This data feeds into Layer 1 and Layer 2 to filter flaky test signals from agent feedback.

Sources: CI historical data, test result databases, flaky test tracking tools.

Path Classification

Path classification is client-configured. Each file in the diff is tagged by the first matching rule:

path_classifications:
  critical:
    - "src/auth/**"
    - "src/payments/**"
    - "infra/**"
    - "*.tf"
  sensitive:
    - "src/api/**"
    - "src/middleware/**"
  standard:
    - "src/**"
  low_risk:
    - "docs/**"
    - "*.md"
    - "test/**"

Output

Layer 0 produces review-context.json, a structured document consumed by all subsequent layers. This is the single source of truth for the review.