Part 8 — Detecting Breaks: Evidence, Tools, and Why One Method Is Never Enough¶
Series navigation: 0. Product Contract · 1. Foundations · 2. Symbol Contracts · 3. Type Layout · 4. C++ ABI · 5. Linker & ELF · 6. Transitive Breaks · 7. Designing for Stability · 8. Detecting Breaks
Parts 0–7 explained the mechanisms: what the compiler bakes into a binary, and which changes corrupt that contract. This part turns the telescope around and asks the engineering question: how do you actually catch each of those breaks before you ship?
Three things matter, and this page covers all three:
- The general approaches to ABI/API tracking — and the failure mode each one has when used alone.
- What evidence each break family requires — matching every family from the break-families table to the minimum input that makes it visible, with the example cases that prove it.
- Why classic single-method checkers (libabigail's
abidiff, ABICC) are not sufficient — and, just as honestly, where any static tool stops, including abicheck.
Tool-track companion pages: this page teaches the concepts; the precise per-source capability matrix lives in Evidence & Detectability, measured accuracy numbers in Tool Comparison & Benchmarks, and the boundary of static checking in Limitations.
1. The general approaches to ABI/API tracking¶
Every team tracks compatibility somehow, even if only by hope. The approaches below are ordered roughly by how much they observe; each catches something the previous ones cannot, and each has a blind spot that motivates the next.
| # | Approach | What it observes | Catches | Blind spot |
|---|---|---|---|---|
| 1 | Process discipline — SemVer policy, review checklists, "don't touch public headers" rules | Human judgement | Anything a reviewer happens to notice | Everything a reviewer doesn't notice — layout shifts from an "internal" change, transitive leaks, toolchain flips. Unverifiable by construction. |
| 2 | Runtime swap testing — build an app against v1, run it against v2 | One consumer's actual usage | Real crashes in the paths the app exercises | Surface the test app doesn't call (usually most of it); silent corruption that doesn't crash; needs a representative app per consumer. |
| 3 | Symbol-table diffing — nm/readelf diff, or any tool run on stripped binaries (L0) |
Exported symbol names, versions, SONAME | Removed/renamed symbols, C++ mangled-signature changes, linker metadata drift | Everything that doesn't change a symbol name: struct layout, enum values, vtable order, C parameter types. |
| 4 | Debug-info diffing — DWARF/PDB-based tools (L1) | Type layout as compiled: sizes, offsets, enum values, vtables | The whole layout family from Part 3 and most of Part 4 | Requires -g artifacts (release builds are usually stripped); largely blind to source-level API facts — access control, default arguments, explicit, hidden friends — which DWARF doesn't record or tools don't model. |
| 5 | Header/AST diffing — compiling public headers and comparing the AST (L2) | The declared source contract | Source-only API breaks, plus scoping: knowing which types are actually public | Blind to binary truth: what was actually exported and with which SONAME/versions, and what flags the shipped binary was really built with. |
| 6 | Build- and source-aware overlay (L3/L4) | Compile flags, default-argument values, inline/template bodies, uninstantiated templates | Facts that never reach any shipped artifact — the source-only tail | Highest setup cost; meaningless without the artifact layers underneath it to anchor the shipped-ABI verdict. |
The pattern: each approach is a projection of the library onto one kind of evidence. None of the projections is the library. A checker is only complete to the extent that it overlays several projections and lets the strongest evidence win — which is exactly the five-layer evidence model abicheck implements, and why runtime testing (approach 2) still belongs in your release pipeline next to static checking: it is the only approach that observes behaviour.
1a. The hidden prerequisite of header/AST diffing: the compile context¶
Approach 5 (L2 header/AST) has a subtlety the table glosses: a header is not a
self-contained fact, it is source code. To turn it into an AST the frontend
must parse it the way your compiler does — with the include roots it
#includes, the C++ standard it assumes (-std), and the -D feature macros
that gate which declarations even exist. Get that context wrong and L2 does not
fail loudly; it produces a different, plausible AST. Two consequences matter
for compatibility:
- L2 is what decides "public." The public/internal boundary — and therefore
whether a removed symbol is a compatible internal cleanup or a breaking API
removal — comes from the header AST. If L2 cannot be built, the scan only has
the binary, so it must treat the export table as the surface and (correctly, by
that narrower rule) flags internal removals as BREAKING. This "scope
divergence" is a missing-context artifact, not a real break: with L2 those
demote to COMPATIBLE. A field run of oneTBB / oneDNN / oneDAL hit exactly this —
dnnl::impl::*and bundledDGETRF/SGETRFremovals reported as breaking purely because the headers could not be parsed. - The wrong context manufactures phantom diffs. Parse at
-std=c++17a library built at-std=c++20and concepts,char8_t,noexcept-in-type, and inline-namespace versions shift — L2 shows add/remove churn that no consumer would ever observe. Likewise a mismatched-D(a feature macro, or libstdc++'s_GLIBCXX_USE_CXX11_ABIdual-ABI switch) changes which declarations are visible at all.
This is why the source of the compile context matters as much as the frontend choice, and why the two frontends are only interchangeable when fed the same context:
| Scan source | What it supplies to L2 | What it cannot supply alone |
|---|---|---|
castxml (--ast-frontend castxml) |
runs your real g++/MSVC, so system includes + predefined macros + the compiler's default dialect come for free |
your project's own -I roots, -D, and the exact -std (still pass these) |
clang (--ast-frontend clang) |
the alternative for clang-only hosts; now auto-probes the host GNU compiler for system includes so libstdc++ resolves like castxml | same as above — auto-detection is system-headers only |
-I / --gcc-options (CLI) |
per-run include roots, -std, -D |
reproducibility — a human/CI must retype them each run |
.abicheck.yml compile: block |
the project's stable, reviewed include roots / std / defines |
per-invocation cross-compile specifics (those stay CLI) |
compile database (compile_commands.json) |
the authoritative per-TU -I/-std/-D the library was actually built with |
(threading it into L2 is a planned step; today it feeds L3–L5) |
The practical takeaway for abicheck scan:
auto-detection makes the common case (find the C++ stdlib) work with no flags,
but the project-specific context — include roots, dialect, feature macros —
must come from a compile DB, the config compile: block, or explicit flags, or
L2 (and the public/internal scoping that depends on it) is only as good as the
context it was handed.
2. What it takes to find each break family¶
The table below extends the
break-families table with the
detection dimension: the minimum evidence that makes the family visible
(L0 binary · L1 +debug info · L2 +headers · L3 +build data · L4
+sources), and whether a symbol-level or debug-info-level checker can see it at
all. Per-case minimums are machine-readable in
examples/ground_truth.json
(min_evidence field) and measured in
Benchmarking by evidence tier.
| Break family | Min evidence | Symbol-only (L0) sees it? | DWARF tools (L1) see it? | Why — and representative cases |
|---|---|---|---|---|
| Symbol/function/variable removal | L0 | ✅ | ✅ | The symbol vanishes from .dynsym — every tool's home turf (case01, case12) |
| C++ signature/qualifier changes | L1 | ⚠️ partial | ✅ | Itanium mangling encodes parameters, const, static — so even a stripped binary shows a symbol vanished and a new one appeared. But classifying it as a qualifier change on the same method (rather than an unrelated removal + addition) takes debug info or headers (case21, case22 are measured at L1) |
| C signature changes | L1/L2 | ❌ | ✅ | C symbols are just the function name — foo(int) → foo(long) keeps the identical symbol. Needs DWARF or headers (case02, case10) |
| Struct/class layout, packing, alignment | L1/L2 | ❌ | ✅ | No symbol changes when a field moves; layout lives in debug info and headers (case07, case40, case56) |
| Enum value reassignment | L1/L2 | ❌ | ✅ | Constants are compiled into callers; the library's symbols are untouched (case08, case20) |
| Vtable reordering | L1/L2 | ❌ | ✅ | Every symbol still exists — only the slot indexes moved (case09) |
Source-only API breaks: access narrowed, explicit added, default argument removed, hidden friends |
L2 | ❌ | mostly ❌ | DWARF doesn't reliably model these; they live in the declared AST (case34, case106, case123, case96) |
| ELF/linker metadata: SONAME, visibility, symbol versions, RPATH | L0 | ✅ | ✅ | Binary-only facts — which means header-only checkers (ABICC's XML mode) are the blind ones here (case05, case65) |
Toolchain/build-flag drift: -std floor, ABI version, flag changes |
L1/L3 | ❌ | partly | Compilers record their flags in DW_AT_producer, so a -g build exposes some drift; the rest needs the compile DB (case103). The libstdc++ dual-ABI flip is the notable exception: it renames mangled symbols (std::__cxx11::), so even a stripped binary betrays it at L0 (case104) |
Header const/constexpr constant values |
L2 | ❌ | ❌ | The value lives in the declared AST, not the binary — header comparison sees it (case124). Plain #define macros are not part of the AST — see the next row |
Plain #define macro values, inline/template bodies, uninstantiated templates |
L4 | ❌ | ❌ | These never reach the shipped binary or the header AST — only source/preprocessor evidence sees them. case122 is deliberately a no-change case: it marks the boundary of what even source analysis can prove about templates that were never instantiated |
| Multi-library release skew (bundle SONAME/dependency drift) | release model | ❌ | ❌ | Not a property of any single binary diff — needs a bundle-level comparison (multi-binary guide, bundle cases 84/90–93 in examples/) |
| Internal-only changes (should be NO_CHANGE) | L2 | FP ⚠️ | FP ⚠️ | The inverse problem: without header scoping, tools flag private detail:: churn as breaking. Evidence here removes false positives (case118–120) |
Two lessons hide in this table:
- Evidence runs in both directions. More input doesn't just find more breaks — it dismisses false alarms. Header scoping is what lets a checker say "that struct changed, but it was never part of the public surface."
- The staircase is real and measurable. Over the example catalog, a stripped binary alone reaches the correct verdict for about a third of cases; adding debug info takes it to ~81%; headers to ~99%; build/source data closes the rest (current numbers in the evidence-tier table).
3. Why an abidiff- or ABICC-class checker is not sufficient¶
This is a structural argument, not tool-bashing — both tools are good at what their evidence lets them see (details and per-case results in the Tool Comparison):
-
Each is capped at one rung of the staircase.
abidiffis DWARF-first (L0+L1): hand it the stripped release binary you actually ship and it degrades toward symbol-only; the source-only API family — access changes, default arguments,explicit,noexceptsemantics — stays invisible even with debug info, because a header directory acts as a symbol filter there, not a full AST. ABICC leans the other way: its header/XML workflow sees the declared contract but not the binary truth (exports, SONAME, symbol versions), and itsabi-dumperworkflow inherits the DWARF ceiling. Neither overlays all the layers, so each one misses families the other catches — and both miss the L3/L4 tail (flag drift, inline bodies, uninstantiated templates). -
No public-surface scoping. Without resolving what is public, every internal
detail::struct edit shows up as a break. In practice that noise — not missed breaks — is what makes teams turn checkers off. The scoped-internal cases (118–120) exist precisely to test that a checker can stay silent correctly. -
A binary verdict is not a release decision. "Compatible / incompatible" collapses distinctions that Part 0 showed are policy-relevant: a source-level
API_BREAKships fine for prebuilt binaries but breaks rebuilders; aCOMPATIBLE_WITH_RISKnoexceptchange is fine unless a consumer relied on it. The 5-tier verdict and policy profiles exist because real release gates need that resolution — as do bundle-level comparison, application-scoped checks, and suppression workflows.
And where everything stops: no static tool — abicheck included — can prove behaviour. A function that keeps its signature and layout but starts returning different values is invisible to every approach in §1 except runtime testing. The honest boundary is documented in Limitations and What ABI tools cannot prove; treat static ABI checking as the part of release safety you can automate exhaustively, not as all of it.
4. Using the encyclopedia as a detection atlas¶
Every capability claim in this series is backed by a runnable fixture, and the
mapping is maintained mechanically — CI checks that every ChangeKind is
produced by a detector, documented, and (for the catalog) carries a verified
verdict and minimum evidence tier:
- Capability → meaning: the Change Kind Reference lists every detectable change kind with its classification.
- Capability → proof: each example page names the
change kinds it triggers, its verdict, and includes a Real Failure Demo; the
expected results live in
ground_truth.json, which the benchmark gates on. - Capability → required input: the
min_evidencefield per case, aggregated in the evidence-tier benchmark, tells you exactly which input you must provide before that break becomes visible — which is the practical answer to "what do I need to feed the checker in my CI?"
Where to go next¶
- Back to the series hub for the other parts.
- Evidence & Detectability — the full per-source capability matrix this page summarizes.
- Choose Your Workflow — turn the evidence you have into the right command for your CI.