Use-Case Coverage Evaluation¶
Date: 2026-06-07 Purpose: Evaluate abicheck against the full landscape of application/library ABI-API change use cases, identify where coverage is deep vs. thin, and record the concrete code / test / example follow-ups.
This is a companion to archive/adr-gap-analysis.md
(which tracks undocumented decisions); this document tracks uncovered
scenarios.
Three related artifacts, kept distinct: the examples catalog (examples/)
demonstrates ABI/API change types; the user-scenario catalog
(User Scenarios & Flows, tests/scenarios/) defines how
users work with abicheck and drives end-to-end tool validation; and the
plans track the capability backlog. This document is the
map across all three.
Headline¶
abicheck is exceptionally deep on the change-taxonomy axis and comparatively
thin on the breadth axes. The "what changed" dimension — 254 ChangeKinds
in a 5-tier policy model, 129 calibrated example cases, ABICC + libabigail
parity — is essentially complete and has diminishing returns.
The remaining gaps are not in detecting more change types. They are the
seven planned breadth/workflow items tracked in usecase-registry.yaml:
header-only/inline-only analysis (G4), auditwheel vendored-library pairing (G9),
manylinux glibc-floor checks (G10), single-binary audit/lint mode (G11),
cross-architecture guardrails (G13), CPython abi3 import-contract checking
(G14), and inline-namespace version-stamp normalization (G15) — plus one newly
partial item, header-scoped source-mode toolchain robustness (G16), whose
diagnostics and castxml --version floor probe have shipped.
Several formerly broad gaps are now closed and should no longer be treated as open roadmap work: native PE/Mach-O compare validation (G1), build-config matrix integration (G2), workflow/report coverage (G3), plugin host↔plugin checking (G5), BTF/CTF and SYCL workflows (G6), release recommendations (G7), static library stance (G8), and security-hardening drift (G12).
The use-case universe (five axes)¶
A real invocation is a point in this space:
| Axis | Values |
|---|---|
| Library archetype | pure-C system lib · C++ template/vtable lib · header-only/inline · plugin (dlopen) · static (.a) · kernel/eBPF · GPU/accelerator (SYCL/CUDA) · FFI-consumed C ABI |
| Platform | ELF/Linux · PE+PDB/Windows (MSVC, MinGW) · Mach-O/macOS (x86-64, ARM64) |
| Change class | binary ABI break · source API break · compatible addition · quality/bad-practice · deployment risk |
| Workflow | CI PR gate · release/package compare · baseline pin · app-compat · multi-lib bundle · build-config matrix · stack/sysroot · Debian symbols · ABICC drop-in · MCP/agent |
| Toolchain/standard | GCC/Clang/MSVC/ICX · C++11→23 floor · libstdc++ dual ABI · flag drift · LP64/ILP64 · char8_t/_BitInt/atomic/ABI-tags |
Coverage scorecard¶
The authoritative, machine-checked status of every use case lives in
usecase-registry.yaml, validated bytests/test_usecase_registry.py(it enforces that coverage claims cite evidence paths that actually exist, and that unfinished items carry a tracked gap + next steps). The table below is a human snapshot; statuses use the registry's vocabulary:complete·partial·modeled(code exists, not validated end-to-end) ·planned·by_design_excluded.
| Use case | Status | Notes |
|---|---|---|
| Change taxonomy | complete |
254 change kinds; 129 cases; parity tests |
| Release recommendation (semver + SONAME) | complete |
semver bump + SONAME action emitted in reports |
| C / C++ archetypes | complete |
35 C + 52 C++ example pairs |
| Linux ELF platform | complete |
the CI-validated baseline |
| Windows PE/MSVC | complete |
G1 closed: cross-platform-e2e lane runs compare on MinGW DLLs; MSVC+PDB lane asserts struct-growth + removed-export verdicts |
| macOS Mach-O/ARM64 | complete |
G1 closed: cross-platform-e2e lane runs compare on Apple-clang dylibs; AAPCS64 HFA/HVA + 16-byte boundary modeled + unit-tested |
compare/release/baseline/Debian/ABICC |
complete |
dedicated CLIs + tests |
| MCP server | complete |
unit-tested (mocks, Linux) |
| Reporting: JSON/SARIF/JUnit | complete |
versioned schema + 34 SARIF / 55 JUnit tests |
| Reporting: Markdown/HTML | complete |
structural coverage across verdict tiers + sections + escaping (G3 done) |
Build-config matrix (probe) |
complete |
G2 closed: wired into compare; both CXX floor and API_DEPENDS proven e2e (.o .symtab surface capture fixed) |
| Bundle / multi-library | complete |
all detectors run via compare-release; case84 validated e2e (Linux-only by design; cross-platform → G1) |
| Plugin (host↔plugin) | complete |
G5 closed: plugin-check CLI + check_plugin_host_contract API + plugin_abi policy |
| Security-hardening drift | complete |
G12 closed: full checksec surface (RELRO/BIND_NOW/PIE/canary/FORTIFY/W^X) diffed; shipped --policy-file security gate |
| Header-only / inline-only | planned |
castxml can't emit concept bodies / ctor mangled names (G4; cases 78/105/106/111 dormant) |
| Kernel / eBPF (BTF/CTF) | complete |
G6 closed: BTF + CTF struct-change run through compare; committed case121 BTF blobs + bare-blob CLI ingestion + gcc -gbtf integration fixture |
| SYCL / accelerator (PI/UR) | complete |
G6 closed: PI and UR adapter entrypoint-drop driven through compare + reports |
Static libraries (.a/.lib) |
by_design_excluded |
G8 decided (option A): non-goal; CLI rejects archives with guidance |
| FFI consumers (Rust/Go/Python) | by_design_excluded |
C ABI covered; other languages a stated non-goal |
Gaps that matter — current implementation status¶
| ID | Status | Current state |
|---|---|---|
| G1 | ✅ closed | Native PE/Mach-O compare is validated in CI; MSVC+PDB has a dedicated non-blocking lane. |
| G2 | ✅ closed | Build matrices fold into compare/compare-release via --probe-matrix-old/new; C++ floor and environment-dependent API findings are end-to-end tested. |
| G3 | ✅ closed | Workflow scenarios and Markdown/HTML report coverage are validated beyond single-pair compare. |
| G4 | planned | Header-only / inline-only libraries still need a libclang header-AST extractor. |
| G5 | ✅ closed | plugin-check and check_plugin_host_contract cover host↔plugin load contracts. |
| G6 | ✅ closed | BTF/CTF and SYCL PI/UR workflows run through compare and reports. |
| G7 | ✅ closed | Semver bump and SONAME action recommendations are emitted by the report layer. |
| G8 | by-design excluded | Static/import archives are rejected with guidance; archive member API checking is a non-goal. |
| G9 | planned | auditwheel/delocate vendored-library hashed SONAME normalization. |
| G10 | planned | manylinux glibc-floor / platform-baseline checks. |
| G11 | planned | Single-binary ABI audit/lint mode. |
| G12 | ✅ closed | Security-hardening drift captures and diffs RELRO, BIND_NOW, PIE, canaries, FORTIFY, and W^X metadata; the security policy is shipped. |
| G13 | planned | Cross-architecture mismatch guardrail. |
| G14 | planned | CPython Limited-API / abi3 import-contract conformance. |
| G15 | partial | Inline-namespace version-stamp normalization for ICU/Abseil/libstdc++-style churn. Detector landed (advisory versioned_symbol_scheme_detected); normalize-and-collapse preset still planned. |
| G19 | complete | PR-tier source intelligence (ADR-035, D1–D10): always-on compiler-free pre-scan + risk-scored escalation, intra-version cross-source validation findings (six checks + FP-rate-gate corpus), single-release hygiene audit, evidence-directed scan focusing, build-emitted source-facts protocol, and a typed run_scan/ScanResult API + per-level provider protocol with per-project cost estimate. |
| G20 | planned | Source-scan & cross-source example corpus (ADR-035 demonstration): single-release audit cases, cross-source corroboration cases (combination beats any single source), and evidence-directed focusing scenarios. Grows the examples/ catalog + test suites to demonstrate the G19 engine; no engine change. |
| G21 | partial | One-shot deep compare + CLI usability (oneDAL eval). Shipped (PR #422): the --depth headers\|build\|graph\|source\|full dial (--max=full, reusing the scan --depth vocabulary) on dump; rich-click option-group --help panels (collapse M1); and the strict-mode honesty fix (empty requested L4 → skipped). Remaining: the one-shot compare orchestrator (dump both sides with --sources, then compare) + header/source auto-discovery, a cross-platform list-threaded --gcc-option, compile_commands.json auto-synthesis, a fail-loud signal on an empty requested layer, and vocab unification (M5). |
| G22 | planned | CLI interface contract, config balance, and extension policy (ADR-037). Follows G21's depth dial with the structural cleanup the flag-divergence audit surfaced: three named tiers with service.py as the only compare chokepoint (fixes compare-release bypassing it with a different scope_public default), typed CompareRequest dataclasses, one decorator per shared option family (kills the severity/header/policy/debug copy-paste drift), a single --depth vocabulary (drops the "evidence" naming and the user-facing L5-graph rung), folding compare-release/deep-compare into compare, --header-backend → --ast-frontend, a CLI↔.abicheck.yml rebalance, an explicit --exit-code-scheme, and a cli-contract CI gate. Backward-compat mechanism designed, left advisory until 1.0. |
| G16 | partial | Header-scoped source-mode toolchain robustness. Surfaced by 21 real-world cron records. Shipped: actionable diagnostics for all three host-toolchain signatures (sized-float _FloatN, GCC __assume__, --lang c + extern "C"), plus a castxml --version probe that recommends the Clang floor (≥ 18) on a version-mismatch failure. A -D_FloatN shim was prototyped and rejected (it rewrites glibc's own typedef float _Float32; fallback); the durable cure is a newer-Clang castxml or the libclang extractor (G4). Remaining: real-host end-to-end check and a dedicated error type. |
Proposed next steps (tracked in the registry)¶
The authoritative backlog is the set of planned entries in
usecase-registry.yaml. Each entry carries a gap, a
plan file, and concrete next_steps; tests/test_usecase_registry.py prevents a
planned row from drifting away from its plan.
| Priority | Gap | Plan |
|---|---|---|
| High | G9 — wheel vendored-library pairing | g9 |
| High | G14 — CPython abi3 import-contract |
g14 |
| Medium | G4 — header-only / inline-only analysis | g4 |
| Medium | G11 — single-binary audit/lint | g11 |
| Medium | G15 — inline-namespace version stamp | g15 |
| Small | G10 — glibc-floor check | g10 |
| Small | G13 — cross-architecture guardrail | g13 |
| Medium | G16 — header-scope toolchain robustness | g16 |
| Medium | G17 — real-world validation corpus | g17 |
| Medium | G18 — Bazel build-evidence | g18 |
| Medium | G20 — source-scan & cross-source example corpus | g20 |
| Medium | G21 — one-shot deep compare & CLI usability | g21 |
| Medium | G22 — CLI consolidation & interface-contract enforcement | g22 |