Skip to content

ADR-032: Evidence Extractor Plugin Interface and Security Model

Date: 2026-06-09 Status: Accepted — implemented (D1–D10). The extractor interface, capability model, action-permission ceiling, external-CLI manifest, collection modes, and reproducibility ledger ship in abicheck/evidence/extractor.py + extractor_manifest.py, wired into collect (--extractor-manifest / --allow-build-query / --collection-mode). Amended 2026-06-12 (ADR-028 source-tree model) — see Amendment below. Decision maker: Nikolay Petrov


Context

The evidence-pack extension (ADR-028..031) depends on several external information sources:

  • build-system query output;
  • compiler command databases;
  • compiler-recorded metadata;
  • source ABI dumpers;
  • graph engines;
  • optional wrapper/interception tools.

Hard-coding every extractor into the core dumper would make abicheck brittle. At the same time, arbitrary plugins create supply-chain, security, privacy, and reproducibility risks — the same class of concerns ADR-021b addressed for the MCP server. abicheck needs a narrow extractor interface that isolates external tools from verdict policy.


Decision

D1. Evidence extractors are adapters, not verdict engines

An extractor may collect and normalize facts. It may not decide final ABI/API verdicts.

external tool / build system / compiler output
Extractor adapter
raw artifact + normalized facts + diagnostics + confidence
abicheck comparison engine and policy
verdicts and reports

The core engine owns: schema validation; entity matching and merging; confidence calculation; finding classification (ChangeKind partition, ADR-011); policy profile application (ADR-010); suppressions and ledgers (ADR-013, ADR-024); and final verdicts/exit codes (ADR-009).

D2. Python extractor interface

class EvidenceExtractor(Protocol):
    name: str
    version: str
    schema_version: int

    def discover(self, context: CollectionContext) -> DiscoveryResult:
        """Report whether this extractor can run and what it can collect."""

    def collect(self, context: CollectionContext, output_dir: Path) -> CollectionResult:
        """Collect raw artifacts. Must not normalize verdicts."""

    def normalize(self, raw_artifacts: list[RawArtifact], output_dir: Path) -> NormalizationResult:
        """Convert raw artifacts into abicheck-owned schema."""

    def validate(self, normalized_artifacts: list[Path]) -> ValidationResult:
        """Schema and consistency checks."""

CollectionContext:

@dataclass
class CollectionContext:
    binary_paths: list[Path]
    header_roots: list[Path]
    source_root: Path | None
    build_root: Path | None
    compile_db: Path | None
    target_selectors: list[str]
    changed_files: list[Path]
    mode: Literal["baseline", "pr", "nightly", "manual"]
    allowed_actions: set[Literal["inspect", "query_build_system", "run_compiler", "run_build", "wrap_build"]]
    redaction_policy: RedactionPolicy
    cache_dir: Path

D3. External CLI extractors through a manifest

External extractors can be installed independently and invoked through a manifest:

name: abicheck-cmake-extractor
version_command: ["abicheck-cmake-extractor", "--version"]
capabilities:
  - build_context
  - target_graph
input_requirements:
  - build_dir
allowed_actions:
  - inspect
  - query_build_system
commands:
  discover: ["abicheck-cmake-extractor", "discover", "--json"]
  collect: ["abicheck-cmake-extractor", "collect", "--output", "{raw_dir}"]
  normalize: ["abicheck-cmake-extractor", "normalize", "--raw", "{raw_dir}", "--output", "{normalized_dir}"]
outputs:
  normalized:
    - kind: build_evidence
      path: build/build_evidence.json

This allows third-party integrations without importing untrusted Python into the abicheck process: the boundary is a subprocess with declared inputs, outputs, and allowed actions.

Manifests are trusted-by-operator, never auto-discovered. abicheck does not scan PATH, the working tree, or any plugin directory for manifests; an external extractor runs only when the operator registers it explicitly — via a config entry (e.g. extractors: in the project config) or a per-run --extractor-manifest <path> flag. A manifest's declared allowed_actions are a ceiling, not a grant: at run time they are intersected with the actions permitted for the run (D5), so a manifest cannot escalate beyond what the operator enabled.

D4. Capability model

{
  "capabilities": {
    "compile_db": true,
    "target_graph": true,
    "toolchain": true,
    "link_actions": false,
    "source_abi": false,
    "source_graph_summary": false,
    "call_graph": false,
    "requires_build_execution": false,
    "requires_compiler_execution": false,
    "requires_network": false
  }
}

Capability reporting drives evidence coverage (ADR-028 D7) and CI policy (ADR-033).

D5. Collection actions are explicitly permissioned

Default allowed action: inspect only.

Action Examples Default
inspect read existing files, parse compile DB, parse CMake File API replies allowed
query_build_system ninja -t, bazel cquery/aquery, CMake File API query regeneration opt-in via --allow-build-query
run_compiler run Clang/castxml/LibTooling syntax-only source extraction opt-in via source replay mode (ADR-030)
run_build cmake --build, bazel build, make denied by default
wrap_build Bear/intercept-build/compiler wrapper denied by default
network download tools or dependencies always denied unless a future explicit mode

If an extractor requests a disallowed action, collection fails with a clear diagnostic.

D6. Predictable raw/normalized artifact layout

evidence/
  manifest.json
  raw/
    cmake-file-api/<hash>/...
    ninja/<hash>/...
    bazel-aquery/<hash>/...
    android-header-abi/<hash>/...
    kythe/<hash>/...
    codeql/<hash>/...
  normalized/
    cmake-file-api/build_facts.json
    ninja/build_facts.json
    bazel/build_facts.json
  build/build_evidence.json
  source/source_abi.json
  graph/source_graph_summary.json
  diagnostics.json

Raw artifact hashes include command, working directory, relevant environment, input file hashes, extractor version, and schema version.

D7. Redaction is mandatory

Command lines and build-system outputs can contain absolute local paths, usernames, source checkout paths, include paths to internal SDKs, tokens in environment variables or compiler flags, and proprietary target names.

Add a RedactionPolicy:

redaction:
  path_mode: repo_relative        # repo_relative | hash_absolute | keep_absolute
  redact_env: true
  secret_patterns:
    - '(?i)token=[^\s]+'
    - '(?i)password=[^\s]+'
    - '(?i)secret=[^\s]+'
  keep_raw_artifacts: false       # default false for public CI artifacts
  keep_command_lines: normalized  # full | normalized | redacted | hash_only

Reports must say when evidence has been redacted and whether this reduces reproducibility.

D8. Validate all normalized outputs

Each normalized artifact must pass JSON schema validation and consistency checks:

  • every referenced target exists;
  • every compile unit has a source path and normalized argv hash;
  • every link-unit output maps to an input binary where possible;
  • every source declaration has a stable ID;
  • every graph edge references existing nodes;
  • unknown enum values are rejected unless explicitly allowed under forward-compat mode.

Invalid extractor output is ignored or downgraded according to the collection mode; it never crashes the core compare unless strict mode is enabled.

D9. Failure modes

Mode Behavior
permissive Missing/failed extractors are reported as reduced coverage; the core ABI compare continues. Default for PR CI.
strict Requested evidence must be collected and valid, otherwise the command exits non-zero. Useful for baseline generation.
audit Preserve raw artifacts and full diagnostics for debugging extractor behavior.

These modes affect collection only; compare exit codes keep their ADR-009 contract.

D10. Tool version and reproducibility ledger

Every pack records:

{
  "extractors": [
    {
      "name": "cmake-file-api",
      "version": "4.3.3",
      "command": "cmake-file-api-reader --...",
      "command_hash": "sha256:...",
      "capabilities": [],
      "started_at": "...",
      "finished_at": "...",
      "status": "success|partial|failed|skipped",
      "diagnostics": []
    }
  ]
}

This ledger is included in JSON/SARIF output (ADR-014) for traceability.


Options considered

Option Description Decision
Built-in only Implement all extractors inside abicheck Rejected; slow to evolve and impossible to cover every build system.
Arbitrary Python plugins Maximum flexibility Rejected for default; too much supply-chain and runtime risk.
External CLI adapter contract Stable process boundary and language independence Accepted.
Raw external formats as stable API Store .sdump, .kzip, CodeQL DB references directly as official schema Rejected; raw formats are unstable and/or too large (ADR-028 D4).

Consequences

Positive

  • Lets abicheck reuse existing tools without adopting their internal schemas.
  • Keeps core verdict policy consistent and testable.
  • Supports third-party adapters for specialized build systems.
  • Makes security and redaction explicit instead of accidental.
  • Enables gradual rollout: core adapters first, external graph plugins later.

Negative / risks

  • More moving parts in CI.
  • External CLI adapters need packaging/version compatibility tests.
  • Redaction can make reproduction harder if raw artifacts are not retained.
  • Capability and confidence reporting must be accurate to avoid misleading users.

Implementation plan

Phase Scope Output
1 Define the extractor manifest and CollectionContext Internal API and docs
2 Built-in compile DB extractor using the same interface First production extractor
3 CMake/Ninja/Bazel adapters (ADR-029) BuildEvidence populated from adapters
4 Raw/normalized artifact layout and schema validator Stable evidence pack format
5 Redaction policy Safe CI artifacts
6 External CLI adapter support Third-party extractor path
7 Strict/permissive/audit modes CI and baseline policy control

References

Amendment (2026-06-12): build-tool query config (see ADR-028)

A per-project .abicheck.yml build: block (system: + a query: command, or --build-config) lets abicheck recover exact ABI-affecting flags and generated headers from a source checkout — disambiguating multi-build-system projects (e.g. oneDAL's Make and Bazel). This is a direct application of the D5 action ceiling and adds no new capability: inspect (read existing outputs) is the default; running the configured query command is the opt-in query_build_system action and requires both --allow-build-query and an explicit trusted --build-config path; auto-discovered source-tree configs are honored only for non-executing settings. run_build/wrap_build stay denied — abicheck never performs a full project build.