ADR-032: Evidence Extractor Plugin Interface and Security Model¶
Date: 2026-06-09
Status: Accepted — implemented (D1–D10). The extractor interface, capability
model, action-permission ceiling, external-CLI manifest, collection modes, and
reproducibility ledger ship in abicheck/evidence/extractor.py +
extractor_manifest.py, wired into collect
(--extractor-manifest / --allow-build-query / --collection-mode). Amended 2026-06-12 (ADR-028 source-tree model) — see Amendment below.
Decision maker: Nikolay Petrov
Context¶
The evidence-pack extension (ADR-028..031) depends on several external information sources:
- build-system query output;
- compiler command databases;
- compiler-recorded metadata;
- source ABI dumpers;
- graph engines;
- optional wrapper/interception tools.
Hard-coding every extractor into the core dumper would make abicheck brittle. At the same time, arbitrary plugins create supply-chain, security, privacy, and reproducibility risks — the same class of concerns ADR-021b addressed for the MCP server. abicheck needs a narrow extractor interface that isolates external tools from verdict policy.
Decision¶
D1. Evidence extractors are adapters, not verdict engines¶
An extractor may collect and normalize facts. It may not decide final ABI/API verdicts.
external tool / build system / compiler output
↓
Extractor adapter
↓
raw artifact + normalized facts + diagnostics + confidence
↓
abicheck comparison engine and policy
↓
verdicts and reports
The core engine owns: schema validation; entity matching and merging;
confidence calculation; finding classification (ChangeKind partition,
ADR-011); policy profile application (ADR-010); suppressions and ledgers
(ADR-013, ADR-024); and final verdicts/exit codes (ADR-009).
D2. Python extractor interface¶
class EvidenceExtractor(Protocol):
name: str
version: str
schema_version: int
def discover(self, context: CollectionContext) -> DiscoveryResult:
"""Report whether this extractor can run and what it can collect."""
def collect(self, context: CollectionContext, output_dir: Path) -> CollectionResult:
"""Collect raw artifacts. Must not normalize verdicts."""
def normalize(self, raw_artifacts: list[RawArtifact], output_dir: Path) -> NormalizationResult:
"""Convert raw artifacts into abicheck-owned schema."""
def validate(self, normalized_artifacts: list[Path]) -> ValidationResult:
"""Schema and consistency checks."""
CollectionContext:
@dataclass
class CollectionContext:
binary_paths: list[Path]
header_roots: list[Path]
source_root: Path | None
build_root: Path | None
compile_db: Path | None
target_selectors: list[str]
changed_files: list[Path]
mode: Literal["baseline", "pr", "nightly", "manual"]
allowed_actions: set[Literal["inspect", "query_build_system", "run_compiler", "run_build", "wrap_build"]]
redaction_policy: RedactionPolicy
cache_dir: Path
D3. External CLI extractors through a manifest¶
External extractors can be installed independently and invoked through a manifest:
name: abicheck-cmake-extractor
version_command: ["abicheck-cmake-extractor", "--version"]
capabilities:
- build_context
- target_graph
input_requirements:
- build_dir
allowed_actions:
- inspect
- query_build_system
commands:
discover: ["abicheck-cmake-extractor", "discover", "--json"]
collect: ["abicheck-cmake-extractor", "collect", "--output", "{raw_dir}"]
normalize: ["abicheck-cmake-extractor", "normalize", "--raw", "{raw_dir}", "--output", "{normalized_dir}"]
outputs:
normalized:
- kind: build_evidence
path: build/build_evidence.json
This allows third-party integrations without importing untrusted Python into the abicheck process: the boundary is a subprocess with declared inputs, outputs, and allowed actions.
Manifests are trusted-by-operator, never auto-discovered. abicheck does
not scan PATH, the working tree, or any plugin directory for manifests;
an external extractor runs only when the operator registers it explicitly
— via a config entry (e.g. extractors: in the project config) or a
per-run --extractor-manifest <path> flag. A manifest's declared
allowed_actions are a ceiling, not a grant: at run time they are
intersected with the actions permitted for the run (D5), so a manifest
cannot escalate beyond what the operator enabled.
D4. Capability model¶
{
"capabilities": {
"compile_db": true,
"target_graph": true,
"toolchain": true,
"link_actions": false,
"source_abi": false,
"source_graph_summary": false,
"call_graph": false,
"requires_build_execution": false,
"requires_compiler_execution": false,
"requires_network": false
}
}
Capability reporting drives evidence coverage (ADR-028 D7) and CI policy (ADR-033).
D5. Collection actions are explicitly permissioned¶
Default allowed action: inspect only.
| Action | Examples | Default |
|---|---|---|
inspect |
read existing files, parse compile DB, parse CMake File API replies | allowed |
query_build_system |
ninja -t, bazel cquery/aquery, CMake File API query regeneration |
opt-in via --allow-build-query |
run_compiler |
run Clang/castxml/LibTooling syntax-only source extraction | opt-in via source replay mode (ADR-030) |
run_build |
cmake --build, bazel build, make |
denied by default |
wrap_build |
Bear/intercept-build/compiler wrapper | denied by default |
network |
download tools or dependencies | always denied unless a future explicit mode |
If an extractor requests a disallowed action, collection fails with a clear diagnostic.
D6. Predictable raw/normalized artifact layout¶
evidence/
manifest.json
raw/
cmake-file-api/<hash>/...
ninja/<hash>/...
bazel-aquery/<hash>/...
android-header-abi/<hash>/...
kythe/<hash>/...
codeql/<hash>/...
normalized/
cmake-file-api/build_facts.json
ninja/build_facts.json
bazel/build_facts.json
build/build_evidence.json
source/source_abi.json
graph/source_graph_summary.json
diagnostics.json
Raw artifact hashes include command, working directory, relevant environment, input file hashes, extractor version, and schema version.
D7. Redaction is mandatory¶
Command lines and build-system outputs can contain absolute local paths, usernames, source checkout paths, include paths to internal SDKs, tokens in environment variables or compiler flags, and proprietary target names.
Add a RedactionPolicy:
redaction:
path_mode: repo_relative # repo_relative | hash_absolute | keep_absolute
redact_env: true
secret_patterns:
- '(?i)token=[^\s]+'
- '(?i)password=[^\s]+'
- '(?i)secret=[^\s]+'
keep_raw_artifacts: false # default false for public CI artifacts
keep_command_lines: normalized # full | normalized | redacted | hash_only
Reports must say when evidence has been redacted and whether this reduces reproducibility.
D8. Validate all normalized outputs¶
Each normalized artifact must pass JSON schema validation and consistency checks:
- every referenced target exists;
- every compile unit has a source path and normalized argv hash;
- every link-unit output maps to an input binary where possible;
- every source declaration has a stable ID;
- every graph edge references existing nodes;
- unknown enum values are rejected unless explicitly allowed under forward-compat mode.
Invalid extractor output is ignored or downgraded according to the collection mode; it never crashes the core compare unless strict mode is enabled.
D9. Failure modes¶
| Mode | Behavior |
|---|---|
permissive |
Missing/failed extractors are reported as reduced coverage; the core ABI compare continues. Default for PR CI. |
strict |
Requested evidence must be collected and valid, otherwise the command exits non-zero. Useful for baseline generation. |
audit |
Preserve raw artifacts and full diagnostics for debugging extractor behavior. |
These modes affect collection only; compare exit codes keep their ADR-009 contract.
D10. Tool version and reproducibility ledger¶
Every pack records:
{
"extractors": [
{
"name": "cmake-file-api",
"version": "4.3.3",
"command": "cmake-file-api-reader --...",
"command_hash": "sha256:...",
"capabilities": [],
"started_at": "...",
"finished_at": "...",
"status": "success|partial|failed|skipped",
"diagnostics": []
}
]
}
This ledger is included in JSON/SARIF output (ADR-014) for traceability.
Options considered¶
| Option | Description | Decision |
|---|---|---|
| Built-in only | Implement all extractors inside abicheck | Rejected; slow to evolve and impossible to cover every build system. |
| Arbitrary Python plugins | Maximum flexibility | Rejected for default; too much supply-chain and runtime risk. |
| External CLI adapter contract | Stable process boundary and language independence | Accepted. |
| Raw external formats as stable API | Store .sdump, .kzip, CodeQL DB references directly as official schema |
Rejected; raw formats are unstable and/or too large (ADR-028 D4). |
Consequences¶
Positive¶
- Lets abicheck reuse existing tools without adopting their internal schemas.
- Keeps core verdict policy consistent and testable.
- Supports third-party adapters for specialized build systems.
- Makes security and redaction explicit instead of accidental.
- Enables gradual rollout: core adapters first, external graph plugins later.
Negative / risks¶
- More moving parts in CI.
- External CLI adapters need packaging/version compatibility tests.
- Redaction can make reproduction harder if raw artifacts are not retained.
- Capability and confidence reporting must be accurate to avoid misleading users.
Implementation plan¶
| Phase | Scope | Output |
|---|---|---|
| 1 | Define the extractor manifest and CollectionContext |
Internal API and docs |
| 2 | Built-in compile DB extractor using the same interface | First production extractor |
| 3 | CMake/Ninja/Bazel adapters (ADR-029) | BuildEvidence populated from adapters |
| 4 | Raw/normalized artifact layout and schema validator | Stable evidence pack format |
| 5 | Redaction policy | Safe CI artifacts |
| 6 | External CLI adapter support | Third-party extractor path |
| 7 | Strict/permissive/audit modes | CI and baseline policy control |
References¶
- ADR-014 — Output Format Strategy (014-output-format-strategy.md)
- ADR-015 — Snapshot Serialization and Schema Versioning (015-snapshot-serialization.md)
- ADR-017 — GitHub Action Design (017-github-action.md)
- ADR-021b — MCP Security Model (021-mcp-security-model.md)
- ADR-028 — Evidence Pack Architecture (028-source-build-evidence-pack.md)
Amendment (2026-06-12): build-tool query config (see ADR-028)¶
A per-project .abicheck.yml build: block (system: + a query: command, or
--build-config) lets abicheck recover exact ABI-affecting flags and generated
headers from a source checkout — disambiguating multi-build-system projects
(e.g. oneDAL's Make and Bazel). This is a direct application of the D5 action
ceiling and adds no new capability: inspect (read existing outputs) is the
default; running the configured query command is the opt-in query_build_system
action and requires both --allow-build-query and an explicit trusted
--build-config path; auto-discovered source-tree configs are honored only for
non-executing settings. run_build/wrap_build stay denied — abicheck
never performs a full project build.