Layer 0: Context Collection | Visdom Code Review

Fully deterministic. Zero AI. Layer 0 collects everything subsequent layers need, producing a single review-context.json document that is the source of truth for the entire review pipeline.

Required Data

Source	What we collect	Format
Git diff	Changed files, added/removed lines, hunks	Unified diff
PR metadata	Title, description, author, labels, linked issues, draft status	JSON
Test coverage	Coverage of affected files, delta vs base branch	JSON report
File classification	Type per file: critical / sensitive / standard / low_risk (from config)	Tags
Repo conventions	Linter configs, `CODEOWNERS`, architecture docs, convention files	Raw
Repository knowledge layer	Code ownership, dependency graph, PR history, commit heatmap, expertise scores	Structured query results
Test reliability data	Known flaky tests, per-test pass/fail history	JSON (TORS input)

Repository Knowledge Layer

Layer 0 requires access to a pre-indexed repository knowledge layer, a deterministic data source that provides code ownership, dependency graphs, commit analytics, and PR history without re-parsing the repository on every run.

This layer must be:

Deterministic: same commit = same data
Pre-indexed: queries complete in seconds, not minutes
Reusable: shared across agents and review runs, not rebuilt per session

📦 Reference Implementation

Reference implementation: Context Fabric (VirtusLab, MIT license), DuckDB analytics over git history, dependency graphs, and PR discussions, served as MCP tools or CLI. Pinned by SHA256, reusable across sessions.

Alternative implementations: any system that exposes the required data (ownership, dependencies, history, expertise) via API or CLI. Examples: GitHub CODEOWNERS + custom scripts, Sourcegraph code intelligence, custom DuckDB/SQLite indexes over git log.

Test Reliability Data (TORS Input)

Layer 0 also collects test reliability history, per-test pass/fail data used to compute the Test Oracle Reliability Score (TORS). This data feeds into Layer 1 and Layer 2 to filter flaky test signals from agent feedback.

Sources: CI historical data, test result databases, flaky test tracking tools.

Path Classification

Path classification is client-configured. Each file in the diff is tagged by the first matching rule:

path_classifications:
  critical:
    - "src/auth/**"
    - "src/payments/**"
    - "infra/**"
    - "*.tf"
  sensitive:
    - "src/api/**"
    - "src/middleware/**"
  standard:
    - "src/**"
  low_risk:
    - "docs/**"
    - "*.md"
    - "test/**"

Output

Layer 0 produces review-context.json, a structured document consumed by all subsequent layers. This is the single source of truth for the review.