Use-Case Coverage Evaluation¶

Date: 2026-06-07 Purpose: Evaluate abicheck against the full landscape of application/library ABI-API change use cases, identify where coverage is deep vs. thin, and record the concrete code / test / example follow-ups.

This is a companion to archive/adr-gap-analysis.md (which tracks undocumented decisions); this document tracks uncovered scenarios.

Three related artifacts, kept distinct: the examples catalog (examples/) demonstrates ABI/API change types; the user-scenario catalog (User Scenarios & Flows, tests/scenarios/) defines how users work with abicheck and drives end-to-end tool validation; and the plans track the capability backlog. This document is the map across all three.

Headline¶

abicheck is exceptionally deep on the change-taxonomy axis and comparatively thin on the breadth axes. The "what changed" dimension — 254 ChangeKinds in a 5-tier policy model, 129 calibrated example cases, ABICC + libabigail parity — is essentially complete and has diminishing returns.

The remaining gaps are not in detecting more change types. They are the seven planned breadth/workflow items tracked in usecase-registry.yaml: header-only/inline-only analysis (G4), auditwheel vendored-library pairing (G9), manylinux glibc-floor checks (G10), single-binary audit/lint mode (G11), cross-architecture guardrails (G13), CPython abi3 import-contract checking (G14), and inline-namespace version-stamp normalization (G15) — plus one newly partial item, header-scoped source-mode toolchain robustness (G16), whose diagnostics and castxml --version floor probe have shipped.

Several formerly broad gaps are now closed and should no longer be treated as open roadmap work: native PE/Mach-O compare validation (G1), build-config matrix integration (G2), workflow/report coverage (G3), plugin host↔plugin checking (G5), BTF/CTF and SYCL workflows (G6), release recommendations (G7), static library stance (G8), and security-hardening drift (G12).

The use-case universe (five axes)¶

A real invocation is a point in this space:

Axis	Values
Library archetype	pure-C system lib · C++ template/vtable lib · header-only/inline · plugin (dlopen) · static (`.a`) · kernel/eBPF · GPU/accelerator (SYCL/CUDA) · FFI-consumed C ABI
Platform	ELF/Linux · PE+PDB/Windows (MSVC, MinGW) · Mach-O/macOS (x86-64, ARM64)
Change class	binary ABI break · source API break · compatible addition · quality/bad-practice · deployment risk
Workflow	CI PR gate · release/package compare · baseline pin · app-compat · multi-lib bundle · build-config matrix · stack/sysroot · Debian symbols · ABICC drop-in · MCP/agent
Toolchain/standard	GCC/Clang/MSVC/ICX · C++11→23 floor · libstdc++ dual ABI · flag drift · LP64/ILP64 · char8_t/_BitInt/atomic/ABI-tags

Coverage scorecard¶

The authoritative, machine-checked status of every use case lives in usecase-registry.yaml, validated by tests/test_usecase_registry.py (it enforces that coverage claims cite evidence paths that actually exist, and that unfinished items carry a tracked gap + next steps). The table below is a human snapshot; statuses use the registry's vocabulary: complete · partial · modeled (code exists, not validated end-to-end) · planned · by_design_excluded.

Use case	Status	Notes
Change taxonomy	`complete`	254 change kinds; 129 cases; parity tests
Release recommendation (semver + SONAME)	`complete`	semver bump + SONAME action emitted in reports
C / C++ archetypes	`complete`	35 C + 52 C++ example pairs
Linux ELF platform	`complete`	the CI-validated baseline
Windows PE/MSVC	`complete`	G1 closed: `cross-platform-e2e` lane runs `compare` on MinGW DLLs; MSVC+PDB lane asserts struct-growth + removed-export verdicts
macOS Mach-O/ARM64	`complete`	G1 closed: `cross-platform-e2e` lane runs `compare` on Apple-clang dylibs; AAPCS64 HFA/HVA + 16-byte boundary modeled + unit-tested
`compare`/release/baseline/Debian/ABICC	`complete`	dedicated CLIs + tests
MCP server	`complete`	unit-tested (mocks, Linux)
Reporting: JSON/SARIF/JUnit	`complete`	versioned schema + 34 SARIF / 55 JUnit tests
Reporting: Markdown/HTML	`complete`	structural coverage across verdict tiers + sections + escaping (G3 done)
Build-config matrix (`probe`)	`complete`	G2 closed: wired into `compare`; both CXX floor and API_DEPENDS proven e2e (`.o` `.symtab` surface capture fixed)
Bundle / multi-library	`complete`	all detectors run via `compare-release`; case84 validated e2e (Linux-only by design; cross-platform → G1)
Plugin (host↔plugin)	`complete`	G5 closed: `plugin-check` CLI + `check_plugin_host_contract` API + plugin_abi policy
Security-hardening drift	`complete`	G12 closed: full checksec surface (RELRO/BIND_NOW/PIE/canary/FORTIFY/W^X) diffed; shipped `--policy-file security` gate
Header-only / inline-only	`planned`	castxml can't emit concept bodies / ctor mangled names (G4; cases 78/105/106/111 dormant)
Kernel / eBPF (BTF/CTF)	`complete`	G6 closed: BTF + CTF struct-change run through `compare`; committed `case121` BTF blobs + bare-blob CLI ingestion + `gcc -gbtf` integration fixture
SYCL / accelerator (PI/UR)	`complete`	G6 closed: PI and UR adapter entrypoint-drop driven through `compare` + reports
Static libraries (`.a`/`.lib`)	`by_design_excluded`	G8 decided (option A): non-goal; CLI rejects archives with guidance
FFI consumers (Rust/Go/Python)	`by_design_excluded`	C ABI covered; other languages a stated non-goal

Gaps that matter — current implementation status¶

ID	Status	Current state
G1	✅ closed	Native PE/Mach-O `compare` is validated in CI; MSVC+PDB has a dedicated non-blocking lane.
G2	✅ closed	Build matrices fold into `compare`/`compare-release` via `--probe-matrix-old/new`; C++ floor and environment-dependent API findings are end-to-end tested.
G3	✅ closed	Workflow scenarios and Markdown/HTML report coverage are validated beyond single-pair `compare`.
G4	planned	Header-only / inline-only libraries still need a libclang header-AST extractor.
G5	✅ closed	`plugin-check` and `check_plugin_host_contract` cover host↔plugin load contracts.
G6	✅ closed	BTF/CTF and SYCL PI/UR workflows run through `compare` and reports.
G7	✅ closed	Semver bump and SONAME action recommendations are emitted by the report layer.
G8	by-design excluded	Static/import archives are rejected with guidance; archive member API checking is a non-goal.
G9	planned	auditwheel/delocate vendored-library hashed SONAME normalization.
G10	planned	manylinux glibc-floor / platform-baseline checks.
G11	planned	Single-binary ABI audit/lint mode.
G12	✅ closed	Security-hardening drift captures and diffs RELRO, BIND_NOW, PIE, canaries, FORTIFY, and W^X metadata; the security policy is shipped.
G13	planned	Cross-architecture mismatch guardrail.
G14	planned	CPython Limited-API / `abi3` import-contract conformance.
G15	partial	Inline-namespace version-stamp normalization for ICU/Abseil/libstdc++-style churn. Detector landed (advisory `versioned_symbol_scheme_detected`); normalize-and-collapse preset still planned.
G19	complete	PR-tier source intelligence (ADR-035, D1–D10): always-on compiler-free pre-scan + risk-scored escalation, intra-version cross-source validation findings (six checks + FP-rate-gate corpus), single-release hygiene audit, evidence-directed scan focusing, build-emitted source-facts protocol, and a typed `run_scan`/`ScanResult` API + per-level provider protocol with per-project cost estimate.
G20	planned	Source-scan & cross-source example corpus (ADR-035 demonstration): single-release audit cases, cross-source corroboration cases (combination beats any single source), and evidence-directed focusing scenarios. Grows the `examples/` catalog + test suites to demonstrate the G19 engine; no engine change.
G21	partial	One-shot deep compare + CLI usability (oneDAL eval). Shipped (PR #422): the `--depth headers\\|build\\|graph\\|source\\|full` dial (`--max`=full, reusing the `scan --depth` vocabulary) on `dump`; rich-click option-group `--help` panels (collapse M1); and the strict-mode honesty fix (empty requested L4 → `skipped`). Remaining: the one-shot `compare` orchestrator (dump both sides with `--sources`, then compare) + header/source auto-discovery, a cross-platform list-threaded `--gcc-option`, `compile_commands.json` auto-synthesis, a fail-loud signal on an empty requested layer, and vocab unification (M5).
G22	planned	CLI interface contract, config balance, and extension policy (ADR-037). Follows G21's depth dial with the structural cleanup the flag-divergence audit surfaced: three named tiers with `service.py` as the only compare chokepoint (fixes `compare-release` bypassing it with a different `scope_public` default), typed `CompareRequest` dataclasses, one decorator per shared option family (kills the severity/header/policy/debug copy-paste drift), a single `--depth` vocabulary (drops the "evidence" naming and the user-facing L5-graph rung), folding `compare-release`/`deep-compare` into `compare`, `--header-backend` → `--ast-frontend`, a CLI↔`.abicheck.yml` rebalance, an explicit `--exit-code-scheme`, and a `cli-contract` CI gate. Backward-compat mechanism designed, left advisory until 1.0.
G16	partial	Header-scoped source-mode toolchain robustness. Surfaced by 21 real-world cron records. Shipped: actionable diagnostics for all three host-toolchain signatures (sized-float `_FloatN`, GCC `__assume__`, `--lang c` + `extern "C"`), plus a `castxml --version` probe that recommends the Clang floor (≥ 18) on a version-mismatch failure. A `-D_FloatN` shim was prototyped and rejected (it rewrites glibc's own `typedef float _Float32;` fallback); the durable cure is a newer-Clang castxml or the libclang extractor (G4). Remaining: real-host end-to-end check and a dedicated error type.

Proposed next steps (tracked in the registry)¶

The authoritative backlog is the set of planned entries in usecase-registry.yaml. Each entry carries a gap, a plan file, and concrete next_steps; tests/test_usecase_registry.py prevents a planned row from drifting away from its plan.

Priority	Gap	Plan
High	G9 — wheel vendored-library pairing	g9
High	G14 — CPython `abi3` import-contract	g14
Medium	G4 — header-only / inline-only analysis	g4
Medium	G11 — single-binary audit/lint	g11
Medium	G15 — inline-namespace version stamp	g15
Small	G10 — glibc-floor check	g10
Small	G13 — cross-architecture guardrail	g13
Medium	G16 — header-scope toolchain robustness	g16
Medium	G17 — real-world validation corpus	g17
Medium	G18 — Bazel build-evidence	g18
Medium	G20 — source-scan & cross-source example corpus	g20
Medium	G21 — one-shot deep compare & CLI usability	g21
Medium	G22 — CLI consolidation & interface-contract enforcement	g22