ADR-035: PR-Tier Source Intelligence and Cross-Source Validation¶
Date: 2026-06-14
Status: Accepted — implemented (D1–D10 shipped; tracked as G19 in
docs/development/usecase-registry.yaml and the phased plan
plans/g19-pr-source-intelligence.md).
All ten decisions landed: D1/D3 (two axes + deterministic level / opt-in
risk-scored auto), D2 (compiler-free pattern + S2 preprocessor pre-scan),
D4 (cross-source validation engine — six checks, FP-rate-gate corpus),
D5 (build-emitted abicheck_inputs/ facts protocol + abicheck-cc wrapper),
D6 (additive .abicheck.yml source/risk/crosscheck config), D7 (POI focusing),
D8 (single-release audit via scan --audit / surface-report --audit),
D9 (asymmetric per-mode defaults), and D10 (typed ScanRequest/ScanResult
engine service.run_scan, the LayerProvider protocol, scan --estimate, and
the abi_scan/abi_audit/abi_estimate MCP tools). The cross-source checks
ship advisory (RISK/API_BREAK, never BREAKING — authority rule) and gate only
once promoted per-check via --crosscheck KEY=error.
Decision maker: Nikolay Petrov
Context¶
A field proposal ("Source scans in abicheck: depth levels, tooling,
cross-checks, and build integration") argues that source-aware analysis should
not be an all-or-nothing "dump every translation unit's AST" mode run only at
release/nightly. Instead it proposes a configurable ladder of source scans
(S0..S6) with per-level cost, a risk-scored escalation policy, cross-source
validation, and build-integrated fact extraction — so that compatibility risk
is caught on the PR, cheaply, and deeper analysis runs only when triggered.
Most of that architecture already exists in abicheck. ADR-028…033 define an optional build/source evidence stack:
- L0/L1/L2 — artifact-authoritative binary, debug-info, and header-AST scan
(
dumper.py,dumper_castxml.py,elf/pe/macho_metadata.py,dwarf_*.py). - L3 — build/toolchain context from compile DB, CMake, Ninja, Bazel, Make
(
buildsource/adapters/*,build_evidence.py,build_diff.py). - L4 — scoped per-TU source-ABI replay via clang/castxml/android backends
with per-TU caching (
source_replay.py,source_extractors/*). - L5 — source/implementation graph (include/call/type/build edges, Kythe /
CodeQL ingest) (
source_graph.py,call_graph.py,include_graph.py). - A CI complexity ladder, replay scopes (
off/headers-only/changed/target/full), per-TU + content-addressed caches, evidence policy, and coverage/metrics reporting (ADR-033 D1–D10).
Mapped onto that stack, the proposal's S0..S6 scale is not a new axis —
it largely re-derives the existing L-layers:
| Proposal level | abicheck today | Gap |
|---|---|---|
S0 diff classifier + risk score |
recommend-collect-mode (path → mode) |
no numeric risk score / escalation thresholds / risk_rules config |
S1 compile-DB / build-flag scan |
L3 (5 adapters, flag-drift diff) | covered |
S2 preprocessor + include graph |
L5 include edges (clang -MM) |
no per-TU macro-value capture, no private-header-leak / generated-config detection |
S3 lexical/pattern scan (no compiler) |
— | missing entirely (no compiler-free tier) |
S4 symbol/reference index |
L5 call edges + Kythe/CodeQL ingest | no clangd-index integration (acceptable) |
S5 targeted semantic AST |
L4 scoped replay | covered |
S6 full AST |
L4 --source-abi-scope full |
covered |
| cross-checks (§6) | partial (dumper logs some leaks) |
no cross-check findings as ChangeKinds |
| build-plugin / wrapper extraction (§8) | ADR-032 extractor plugin + ADR-033 D1 "Phase 6" | not implemented; no dump/facts artifact protocol |
| normalized facts, caches, coverage report | covered (ADR-030/033) | extend naming for new tiers |
So three genuine gaps remain, and they are exactly where the proposal's value concentrates:
- No cheap, always-on PR tier. Every existing evidence layer (L3–L5) needs
a compile DB or a compiler. There is nothing that runs in
<5%of build cost on every PR with no toolchain (the proposal'sS0/S2-macro/S3). - No cross-source validation or evidence-directed focusing. abicheck holds binary exports, header AST, build flags, and source facts in one snapshot but (a) never diffs them against each other within a single version (exported-but-not-public, header built with different macros than the shipped TU, private-header leaks, ODR type variants), and (b) never feeds the cheap evidence forward to target the expensive source scan. The information-sharing is one-directional today; binary/header deltas should point the source scan at specific entities/TUs, not just be checked against them. The cross-source checks are also valuable on a single release (no baseline compare) — a whole class of "bad ABI hygiene" lint that justifies a broader scan on its own.
- No build-integrated extraction or single
scanUX. Capability is spread acrossdump/compare/collect/compare-graph; there is no risk → budget → escalation orchestrator, no per-project cost estimate to pick a depth, no stable programmatic API for the level ladder, and no way to ingest facts emitted by the product build instead of re-running a frontend.
Decision¶
D1. Two orthogonal axes: L = evidence layer, S = source-analysis method¶
There are two different things the proposal and abicheck talk about, and they must not be conflated:
- L0–L5 — evidence layers (the what + authority). Where a fact comes from
(binary / debug / header / build / source-ABI / graph) and how much it is
trusted. L0–L2 stay authoritative for
BREAKING; L3–L5 onlyRISK/API_BREAK(ADR-028 D3). This is the user-facing depth axis and the authority axis. - S0–S6 — source-analysis methods (the how). Six distinct, cost-ordered techniques for analyzing source, from a diff classifier up to a full AST. These are not evidence layers and not a competing authority/selector axis — they are the pipeline of methods that produce the L3–L5 evidence, and the granularity at which coverage is reported.
So the S-scale is a real, named concept (source analysis is genuinely six
graduated methods, not one "parse the AST" step), orthogonal to L: an S-method
runs and its output lands in an L-layer. Both axes are user-selectable —
--source-method sN pins the source-analysis method, --depth sets the L-layer
ceiling; users pick whichever axis they think in. S being selectable does not
fork the evidence/authority model: authority stays on the L-axis (an L3–L5 fact
never alone gates BREAKING, whatever S produced it). Mapping:
| S-method | What it does | Tool | Produces (L) | Cost |
|---|---|---|---|---|
| S0 diff classifier | changed-file → risk tags/score | git diff | — (drives D3/D7) | <1% |
| S1 compile-DB / flags | ABI-affecting build context | parse compile_commands.json etc. |
L3 | <1% |
| S2 preprocessor / includes | macro values, include/leak graph | clang -E/-MM |
L3→L5 | 1–10% |
| S3 lexical pattern scan | ABI-risk constructs, no semantics | regex / Tree-sitter | pre-scan facts (→L2/L5) | <1–5% |
| S4 symbol/reference index | decls/refs/inheritance/call graph | clangd-index / Kythe / CodeQL | L5 | 5–30% |
| S5 targeted semantic AST | layouts, instantiations, body hashes | clang/castxml on selected TUs | L4 (+L5 edges) | 5–40% |
| S6 full AST | whole-project semantic facts | clang/castxml all TUs | L4 full-scope | 20–80% |
The authority rule (ADR-028 D3) is preserved: whatever S-method produced a
fact, an L3–L5 fact never alone decides a shipped BREAKING verdict — it
explains, localizes, and adds confidence.
S-methods are independent, cost-ordered passes — not a feed-forward pipeline. S5/S6 (and the S4/S5 graph edges) are their own clang/castxml passes over the TUs; they do not consume S3/S4 output. The cheap methods (S0 diff, S3 patterns) only produce the POI list that selects which TUs the expensive S4/S5 pass parses. Consequence: whenever sources + a compiler are available, S5 runs (POI-targeted) and the L5 edges it needs exist; the only reasons S5/L5 do not run are a missing toolchain or an explicit budget/scope exclusion — never "S3/S4 data was missing".
D2. Add an always-on PR pre-scan tier (extends L2/L5)¶
A new cheap tier runs on every PR over changed + public files. It has two parts with different requirements:
- Pattern pre-scan (new
buildsource/pattern_scan.py): the compiler-free part — needs no compile DB and no compiler. A regex/lexical scan for ABI-risk constructs —#pragma pack,alignas,__attribute__((packed| visibility)),__declspec(dllexport|dllimport),extern "C", calling- convention macros, explicit /externtemplate instantiation,inline namespace, public virtual methods,operator new/delete. Emits advisory facts and escalation triggers, never a final verdict. Starts as a stdlib regex scanner (no new dependency); Tree-sitter is an optional later backend. - Preprocessor pre-scan (extends
buildsource/include_graph.py): the conditional part — runs only when a compile DB and a preprocessor (clang -E) are available (reported as skipped otherwise). Captures per-TU macro values for ABI-affecting macros, detects public-header-includes-private/generated-header leaks, and flags per-TU macro-value divergence for the same public type.
These feed D3 (escalation) and D4 (cross-checks). Output is normalized facts in
the existing buildsource schema, with coverage reported (ADR-033 D6/D9).
D3. Explicit deterministic level; risk-scored auto and budget are opt-in¶
The level is chosen explicitly and is deterministic so a CI gate produces the same scan for the same inputs:
- The user picks the level on either axis (
--source-method sNor--depth, D1). Each--mode(pr/pr-deep/baseline) is a fixed preset of (L,S), not risk-varying. - A numeric risk score drives two opt-in things only: the
autolevel (--source-method auto, local/dev convenience), and POI focusing within the chosen level (D7, deterministic for a fixed diff). Risk never silently changes the level of a CI run that pinned one. The score's path-glob weights are an illustrative, tunable default profile shipped as therisk_rulesconfig block — not a normative contract; the spec fixes only the ordering of signals (public header > export map > ABI flag > internal source > docs/tests). Because the score only feeds opt-inauto+ POI selection, the exact numbers never affect a pinned/deterministic gate. - A time budget is optional and, on overflow, fails (nonzero exit) — it never silently shrinks scope. For gating CI, pin a level; do not use a budget.
- Add a
scansubcommand (abicheck/cli_scan.py, registered per the CLAUDE.md sibling-module pattern) that orchestrates: classify → always-on tier (D2) → run the chosen L/S level (POI-focused) → emit one coverage- and confidence-annotated report stating exactly which S-method/L-layer ran, never a bare "source scan failed".scanis a front-end over existingdump/compare/collect; it adds no new authority.
D4. Cross-source validation engine (new findings, RISK/API_BREAK tier)¶
New buildsource/crosscheck.py consumes one merged snapshot and diffs its
evidence sources against each other within a single version. Each check is a
named ChangeKind (added per the four-step procedure in /CLAUDE.md,
partitioned into RISK_KINDS or API_BREAK_KINDS — never BREAKING_KINDS,
per D1). Initial set, cheapest first:
| Check | Inputs | Tier |
|---|---|---|
exported_not_public |
binary exports ↔ L2 header decls | RISK |
public_not_exported |
L2 header decls that promise an external symbol ↔ binary exports | RISK |
header_build_context_mismatch |
L2 header macros/flags ↔ L3 build flags | API_BREAK |
private_header_leak |
L5 include graph ↔ install manifest | RISK |
odr_type_variant |
L4 per-TU layouts of one type | API_BREAK |
public_to_internal_dependency / behavioral risk |
L5 reachability ↔ PR changed files | RISK |
The §6.8 provider-agreement matrix maps directly onto the existing
LayerConfidence; each finding records which providers (public_header_ast,
source_index, binary_exports, debug_info, build_config) corroborate the
entity, driving the confidence tag.
Coverage honesty. A cross-check whose required layer was not collected (e.g.
behavioral-risk needs the L5 edges from an S4/S5 pass, but the host lacked a
compiler or budget excluded it) is reported as a soft advisory (info) that
names the layer/method to enable (e.g. "run --source-method s5/--depth
graph"). It is never counted as clean and does not affect the exit code.
With sources + a compiler present this case does not arise — the check runs for
real.
Source-evidence integrity. "L4 ran" is not the same as "L4 linked to the
exported ABI surface." If replay selects or parses TUs but produces zero public
declarations, zero reachable source entities, or zero
source_decl_to_binary_symbol matches while L0 exports are present, the result
is degraded/unlinked source evidence, not successful L4 coverage. The report
must say which boundary failed:
- compile-unit selection: selected TUs, target/library mapping, changed-path input, include graph present/full/partial;
- extractor execution: parsed TUs, failed TUs, first failure samples;
- public-surface classification: public roots used, public-classified decls, private/system/unknown decl counts;
- entity keys: mangled decl count, qualified-name count, template/inline-only count;
- binary link: exported symbol count, matched symbols, unmatched exports, sample nearest decl candidates.
The oneDAL field run hit this failure shape: thousands of exports, thousands of selected/parsed TUs in one pack, but zero public declarations / zero matched symbols / all exports unmatched; a rerun shape selected thousands of TUs but failed every extractor invocation. ADR-035 therefore requires the integrity gate and counters above before L4/L5 coverage can be trusted as a focusing or cross-check input. The current artifacts identify the failed boundary class (unlinked/degraded source evidence), not a single proven root cause.
public_not_exported is intentionally narrower than "every public declaration":
inline functions, uninstantiated templates, constexprs, type-only declarations,
and hidden-visibility APIs are public source surface but do not promise a dynamic
symbol. The check only compares declarations with an export obligation (for
example exported functions/data, visibility annotations, version-script entries,
or explicit instantiations) against binary exports.
exported_not_public (the reverse: a symbol is exported but no public header
declares it) runs only over symbols that pass the existing
dumper._is_abi_relevant_symbol() filter (which already drops compiler-emitted
internals, transitive stdlib symbols, and private __-prefixed C symbols) and
respects public-surface scoping (ADR-024). Default severity warning,
suppressible through the existing suppression system — so it does not light up
healthy libraries with intentional internal exports.
D5. Build-integrated extraction: dump/facts artifact protocol + providers¶
Implement the proposal's §8 as concrete ADR-032 extractor providers, with the extended-variant §8.0 distinction made explicit:
- Flow 1 (abicheck-runs-replay) — already supported:
dump --sources/collectreplays selected compile commands. Stays the portable default. - Flow 2 (build-emits-facts) — new: a standardized dump/facts artifact protocol the product build drops next to its binary, which abicheck then ingests without re-running a frontend:
abicheck_inputs/
manifest.json
binary/… headers/… build/compile_commands.json
source_facts/*.jsonl # preferred — normalized facts
raw_ast/*.json.zst # optional, debug/forensic only
pp/*.macros.jsonl deps/*.d # optional
This rides the existing merge flow (artifact-side + source-side dumps merged
into one baseline, ADR-028 D8 / ADR-033 amendment).
- Providers (ADR-032 manifest extractors): a Clang plugin
(
-fplugin=…, normalized facts during normal compile, no raw AST by default) and a compiler wrapper (abicheck-cc clang++ …, companion extraction action). GCC-fdump-lang-class/-tinstand MSVC remain documented fallbacks.
Canonical rule: normalized source facts are the comparison format; raw AST
dumps are an MVP-ingest/forensic fallback only. The Clang plugin is positioned
as a performance optimization (removes the second frontend pass), never a
requirement — it is compiler-version-sensitive, so compile_commands.json
replay + LibTooling/CastXML stays the supported portable path.
D6. Additive .abicheck.yml schema for levels, budgets, and cross-checks¶
Extend the config loaded by buildsource/inline.py with optional source:
{ method, depth, budgets, layers } (method is the exact S-axis selector,
depth is the coarse L-axis selector, and layers matches the existing collect
API key — not a new levels spelling), risk_rules, and crosschecks:
{ <check>: info|warning|error } blocks. All additive and defaulted-off where
they imply new cost; they map onto the existing collect-mode / evidence-policy
machinery rather than replacing it.
D6.1 — Amendment (2026-06): compile: block + L2 must be reachable from scan.
The cross-source checks (D4) and public-surface scoping depend on a real L2
header AST — that is what tells the scan which symbols are public. A field run
of oneTBB / oneDNN / oneDAL showed L2 silently failing under scan: the header
parse needed the translation unit's compile context (include roots, the host
libstdc++ paths, a -std) and scan had no flag, config, or compile-DB path
to supply it, so the scan fell back to a binary-strict scope and (correctly, by
its own rules) reported internal removals — oneDNN's dnnl::impl::*, oneDAL's
bundled DGETRF/SGETRF — as BREAKING. With L2 those demote to COMPATIBLE.
Two fixes, both additive: (a) scan gains the same L2 compile-context flags as
dump via one shared decorator (CLI parity — specified in ADR-037 D8.1); and
(b) the stable, version-controlled half of that context lives in a new
.abicheck.yml compile: block (per the D4 CLI-vs-config rule — include
roots, std, defines, frontend, sysroot are stable project properties, while the
per-invocation/cross-compile flags stay CLI overrides):
compile:
frontend: auto # auto | castxml | clang (== --ast-frontend)
std: c++20 # parse standard (-std=)
include_dirs: [include, third_party/include]
defines: [DNNL_ENABLE_FOO=1]
sysroot: /opt/sysroot
nostdinc: false
Precedence: explicit CLI flag > .abicheck.yml compile: > compile-DB-derived
flags > auto-detected system includes. Auto-detection (clang gaining castxml's
system-include discovery) is the default floor so a bare scan -H finds the C++
stdlib; it cannot guess the project's own include roots / macros / dialect, which
is why the config block and CLI overrides remain. Threading the per-TU -I/
-std/-D from an existing --compile-db into the L2 frontend (today the
compile DB feeds only L3–L5) is the precise next step and is tracked as a
follow-up — it needs header→TU flag mapping and so is scoped separately.
D7. Evidence-directed focusing: cheap facts steer the expensive scan¶
Cross-source links are used in two directions, not one. D4 reads them to emit findings; D7 reads the same links before the expensive scan to shrink its scope to a points-of-interest (POI) set. The cheap, already-computed L0/L1/L2 facts (and the L0↔L2 deltas vs. baseline) decide what L4/L5 looks at:
- exported symbol changed but the header did not → resolve its source declaration and replay only that TU/entity;
- header type whose layout is macro-conditional → capture macro values only for the TUs that materialize it;
- new/removed export with no public declaration → point the scan at the source decl that emits the symbol;
- demangled exported template symbol → seed which instantiations/TUs to replay.
Mechanically this is the reverse of the existing explain-finding
localization walk (export → decl → header → build option): instead of explaining
a finding after the fact, the POI set is computed up front and handed to
source_replay scope selection and the cross-check engine as a work-list. Net
effect: a large project pays L4/L5 cost only on the handful of entities the
binary/header evidence already flagged.
Floor: the POI set always includes the directly-changed files/TUs
unconditionally — the risk score only adds further candidates, it can never
remove a changed TU. So a mis-weighted risk_rules profile (D3) can broaden
the scan but cannot drop an obviously-relevant changed translation unit.
scan --estimate must expose selection cost separately from parse cost. For a
release pair without PR metadata, changed paths come from source tags/trees when
available; otherwise source-changed must report that it has no changed-path
seed and is falling back to a broader scope. At minimum the estimate/result
records changed-path count, public-header changed-path count, include-graph
availability, selected TU count, cache hit/miss count, and parsed/skipped count.
The oneDAL run showed why this matters: source-changed still took minutes, so
the selector either lacked a real changed-path set, selected too broadly, or
fell back because graph/root data was incomplete.
D8. Single-release audit mode (no baseline required)¶
The D2 pattern facts and the D4 cross-checks are intra-version — they need
exactly one build, not a compare. Expose them as a first-class audit that
runs without a baseline and emits a catalog of single-release "bad ABI hygiene"
findings: accidental ABI surface (exported_not_public), non-self-contained or
private-header-leaking public headers, header_build_context_mismatch, ODR type
variants across TUs, inconsistent/missing symbol visibility, unversioned exported
symbols where a version script exists, and RTTI/typeinfo emitted for internal
types. This is wired into the existing single-binary surface tooling
(surface-report, the G11 single-binary audit) and reported as a lint with its
own severity mapping. Because it delivers value from a single artifact, it
justifies running a deeper one-time scan than a two-build PR diff would.
D9. Asymmetric defaults: full at baseline, scoped on PR¶
The default depth differs by when the scan runs:
- Baseline / release publish → full: full dump + full source analysis
(
--source-abi-scope full), all cross-checks, single-release audit, and raw facts archived for cache warmup. Computed once, amortized, authoritative, and cached in the baseline registry (ADR-022). - PR / CI → scoped: always-on tier (D2) every time, plus a fixed,
pinned L/S level (the
prpreset, or whatever the user pins), POI-focused (D7). Deterministic — same inputs, same scan.auto/budget stay opt-in (D3).
The asymmetry is the point: the baseline is produced once so it can afford full depth and gives the PR a rich, cached fact set to diff and focus against; the PR runs on every push so it must stay cheap. Project size shifts the PR default (see Sizing guidance) — small projects can simply run full on PRs too.
D10. Programmatic API: where each level plugs in¶
The level ladder is exposed as a typed Python API so the CLI, the MCP server, and CI wrappers all drive the same engine. Layering, top-down:
| Layer | Where | Responsibility |
|---|---|---|
| CLI | cli_scan.py, existing cli*.py |
argv → ScanRequest; render report/exit code |
| MCP | mcp_server.py |
expose scan/audit/estimate as agent tools |
| Service | service.py (extend) |
run_scan(ScanRequest) -> ScanResult; orchestrates classify → POI → run pinned level (auto-escalate only if opted in) → collect → cross-check → report |
| Levels | buildsource/ providers |
one provider per level, uniform protocol |
| Facts | buildsource/model.py, source_abi.py, build_evidence.py |
normalized fact schema (canonical I/O of every provider) |
- Provider protocol — each level (the D2 pattern/preprocessor pre-scan, L3
build, L4 replay, L5 graph, the D4 cross-checks) implements a small uniform
interface modelled on the existing ADR-032
DataExtractor:capabilities(),estimate(ctx) -> CostEstimate,run(ctx, poi) -> Facts, consuming a sharedScanContext(paths, compile DB, changed files, budget, cache) and the POI set, and returning normalized facts only (never raw AST as primary output). This is what makes levels independently runnable (ADR-033 D1) and lets external/build-emitted providers (D5) drop in via the ADR-032 manifest. - Request/result objects —
ScanRequest(binaries, headers, compile DB, baseline,mode,source_method/depth, budget, risk rules) andScanResult(findings: list[Change], per-layerLayerResult/coverage,CostEstimateprojected cost compared against each layer's actualelapsed_s, confidence/provider matrix).ScanResultis what the reporter, PR comment, SARIF, and MCP all consume — one object, many renderings. estimateis a first-class entry point, not a side effect: a dry-run that probes the project (TU count from compile DB, header fan-out, cache state) and returns the projected cost of each level for this project so a maintainer (or CI) can choose a budget/depth. Implemented asservice.estimate_scan(), surfaced asabicheck scan --estimateand an MCP tool.
Sizing guidance (defaults by project scale)¶
Cost anchors are from §11 of the proposal (full -fsyntax-only ≈ 20–80% of a
clean build; pattern/compile-DB scans <1–5%). These are starting defaults;
scan --estimate (D10) gives the real per-project number.
| Project scale | PR default | Baseline default |
|---|---|---|
| Small (≲50 TUs, builds in seconds) | pin full source (S6) every PR | full |
| Medium (~50–500 TUs) | always-on tier + pin POI-targeted S5 | full (nightly or on release) |
| Large (≳500 TUs, or template/header-heavy) | always-on tier + cross-checks + pin POI-targeted S5; lean on cache | full on release only; warm cache from it |
| Header-only / template-heavy | AST cost is header-dominated — pin full S6 if small, else POI-targeted S5 | full |
These are pinned per-mode defaults (deterministic), not risk-varying. Rule of
thumb: if scan --estimate puts full L4/S6 under your time budget, pin it;
otherwise pin a cheaper level (S5/S3). --source-method auto lets risk + POI pick
the level for local/dev, but pin an explicit level for reproducible CI gates.
Maintainer UX / adoption path¶
Designed as a progressive ramp — value at step 1 with zero build integration, deeper signal as the maintainer opts in:
- Zero-config —
abicheck scan --binary libfoo.so --headers include/runs L0/L1 + the always-on lexical tier (S3) and the single-release audit (D8) with no compile DB and no compiler. L2 header-AST (castxml) is added when castxml is present, otherwise reported as skipped — it is not part of the no-compiler promise. Immediate hygiene findings either way. - Add a compile DB —
--compile-db build/compile_commands.jsonunlocks L3 and POI-targeted L4/L5 (D7). Still one command. - Pick a depth —
abicheck scan --estimateprints projected per-level cost for this repo; the maintainer setssource.budgets/ mode in.abicheck.ymlaccordingly (D6, D10). - CI — the GitHub Action (ADR-017) gains
mode:+ budget; PRs get a sticky comment (pr-comment) showing findings plus the coverage/confidence block so a partial scan is legible, never a bare "failed". - Baselines —
baseline pushstores a full-depth baseline (D9); PRs pull and diff against it, getting both the cached facts and the focusing seed for free. - Standalone audit —
abicheck scan --audit(orsurface-report) as a lint on a tag/release or pre-merge, independent of any baseline. - Promote to gating — findings start advisory; once the FP-rate gate is
trusted for a check, the maintainer raises it to error via
crosschecks:/ severity config. Adoption never starts by blocking merges.
Consequences¶
Positive
- A real always-on PR signal (D2/D3): risky source patterns are caught with no compiler at all (pattern pre-scan); ABI-affecting flag/macro changes and private-header leaks are caught from the compile DB + preprocessor when those are available — all without a full build.
- The cross-check engine (D4) turns abicheck from single-source-of-truth into a multi-evidence system with corroboration-based confidence and better explanations — at low cost, since the inputs already sit in the snapshot.
- Flow 2 (D5) lets vendor/closed-source consumers contribute exact-build-context facts without shipping sources or letting abicheck rebuild the project.
- Evidence-directed focusing (D7) makes deep analysis affordable on large projects by spending L4/L5 cost only on binary/header-flagged entities.
- The single-release audit (D8) delivers value from one artifact — adoption needs no baseline and no second build.
- One typed
ScanRequest/ScanResultAPI (D10) means CLI, MCP server, and CI wrappers share one engine, and--estimatelets each project pick a depth on measured cost instead of guesswork.
Negative / costs
- The Clang plugin and compiler wrapper are compiler-version-sensitive and add maintenance surface; mitigated by keeping them strictly optional optimizations behind the portable replay path (D5) and the ADR-032 action ceiling.
- New
ChangeKinds (D4) must stay correctly partitioned (import-time assertion) and earn corpus/FP-rate-gate coverage before they can gate, to avoid false positives on legitimate internal noise. scan(D3) adds a CLI surface; it must remain a thin orchestrator so the authority rule (D1) is not diluted.
Out of scope
- Replacing or re-numbering the L0–L5 model.
- Making any source/cross-check finding authoritative for a
BREAKINGverdict. - clangd-index integration (the existing call/include graph is sufficient).
Relationship to existing ADRs¶
- ADR-028 — preserves the evidence-pack authority rule (D1); adds Flow 2 artifact protocol on the merge path (D5).
- ADR-029/030/031 — D2/D4 consume L3/L4/L5 outputs; no model change.
- ADR-032 — D5 plugin/wrapper are extractor providers under its security model and action ceiling.
- ADR-033 — D2/D3 extend the CI ladder, replay scopes, caches, and coverage reporting; D5 realizes its "Phase 6: instrumented compiler/build plugins".
- ADR-025 — D3 generalizes PR-diff-as-trigger into a scored escalation; D7 reverses its localization walk to target the scan up front.
- ADR-022 — D9 stores the full-depth baseline; PRs pull facts + focusing seed.
- ADR-017 — D2/D3/D9 surface through the Action's
mode/budget; UX step 4. - ADR-024 /
surface-report/ G11 — D8 single-release audit extends the single-binary surface tooling. - ADR-021b (MCP) — D10 exposes
scan/audit/estimateas agent tools.
See the phased work breakdown in plans/g19-pr-source-intelligence.md.
References¶
- Source-scan strategy proposal (uploaded field document, 2026-06).
- ADR-028 — Source and Build Evidence Pack (028-source-build-evidence-pack.md)
- ADR-032 — Evidence Extractor Plugin Interface (032-evidence-extractor-plugin-interface.md)
- ADR-033 — CI Rollout, Performance, Caching, and Validation (033-ci-rollout-performance-and-validation.md)
- Clang plugins: https://clang.llvm.org/docs/ClangPlugins.html
- LibTooling: https://clang.llvm.org/docs/LibTooling.html
- Tree-sitter: https://tree-sitter.github.io/tree-sitter/