Evidence & Detectability: What Each Method Can and Cannot See¶

One idea drives this whole page: different methods observe different evidence, and no single method detects every compatibility issue. A tool can only report what its inputs let it see. Feed it symbols only and it sees symbol changes; feed it debug info and it sees layout; feed it headers and it sees source-level API. Some changes (#define macros, inline/template bodies, uninstantiated templates) are invisible to any artifact comparison.

This page is the conceptual companion to the practical Limitations and Tool Comparison pages; for the teaching-track version — which break families need which evidence, with worked example cases — see Part 8 of the learning series. It answers the question users ask most often:

"Why did tool A catch this and tool B didn't?"

Almost always, the answer is evidence: the two tools were looking at different inputs.

0. The five sources of information¶

A release engineer can hand a compatibility checker up to five different sources of information about a library, ordered from the least to the most. Each one adds facts the previous cannot see; none of them is complete on its own. abicheck names them with the layer codes L0–L4. A sixth layer, L5, is not something you hand over — it is a source/build graph abicheck derives from L3 (and any L4 surface) to localize and explain findings. So the full model is six evidence layers, L0–L5 (matching Build Info & Sources), of which the five L0–L4 are inputs you provide and L5 is derived. This section covers the five you provide; the derived L5 layer is detailed below and in Build Info & Sources. You can see which artifact layers (L0–L2) a given input exposes with abicheck dump --show-data-sources; the build/source layers (L3/L4) are not reported there — they surface in the pack-aware compare layer_coverage table once you supply a build/source pack:

#	Source you provide	Layer	abicheck input	What it newly reveals
1	Just the binary	L0	a stripped `.so`/`.dll`/`.dylib`	Exported symbols, SONAME/install-name, symbol versions, visibility, binding, `DT_NEEDED`/`LC_LOAD_DYLIB` dependencies
2	+ Debug symbols	L1	a `-g` build (DWARF/PDB) or sidecar debug file	Type layout: struct/class sizes, field offsets, enum values, vtable slots, calling convention, packing/alignment
3	+ Public headers	L2	`-H include/` (parsed by castxml or clang — `--ast-frontend`)	Source-level API: signatures, overloads, access (`public`/`private`), `final`/`explicit`/`noexcept`, templates, declared default args, public/internal scoping
4	+ Build system data & options	L3	`-p build/` (compile DB, CMake/Ninja/Bazel/Make)	The flags the library was actually built with: `-std`, `_GLIBCXX_USE_CXX11_ABI`, `-fvisibility`, `-fabi-version`, toolchain/sysroot, target graph, export maps
5	+ Sources	L4	a build/source pack (per-TU source ABI replay, ADR-030)	Facts that never reach the binary: macro constants, `constexpr` values, default-argument values, inline/template bodies, uninstantiated templates

Read this as a staircase: each step up the table can both find breaks the step below is blind to and prevent false positives the step below would raise. A struct-field insertion is invisible at L0 but obvious at L1 (case07); an internal-struct change that looks like a break at L1 is correctly dismissed once L2 headers reveal the struct is non-public (case118).

The derived sixth layer, L5. Beyond the five sources above, abicheck derives an L5 source/build graph (include/type/call reachability, ADR-031) from L3 (and any L4 surface) to localize and explain findings and prioritize cross-symbol impact. It is covered with the other build/source layers in Build Info & Sources.

Layers (L) vs. scan levels (S). The L0–L5 codes name evidence layers — what abicheck sees and how much that evidence is trusted. The abicheck scan command has a separate s0–s6 axis naming the method used to gather the L3–L5 evidence — a different meaning of the word "level". Scan Levels (S vs L) explains both axes and how they map onto each other.

How they combine¶

The layers are independent and additive, not a fallback chain — abicheck overlays every source you give it and lets the strongest evidence win, under one rule (the authority rule, see Build & Source Packs):

Artifact-backed evidence (L0/L1/L2) is authoritative for the shipped-ABI verdict. Build/source evidence (L3/L4) explains, localizes, scopes, or adds confidence to a finding, and can raise source-/API-level findings of its own — but it never silently deletes an artifact-proven break.

Concretely: L0 says a symbol changed; L1 says its layout changed by N bytes; L2 says and the public declaration that names it changed too; L3 says and it was built with a different -std, so expect churn; L4 says and the macro it expands actually changed value. The verdict is computed worst-wins across all of them. The design of how the layers are collected and reconciled is in Architecture; the per-case evidence each example needs is benchmarked in Tool Comparison §Benchmarking by evidence tier.

Best input you can give abicheck: old + new library, matching public headers, debug info, and the build's compile database — L0+L1+L2+L3 together. With less, abicheck degrades down the staircase and tells you exactly which layers it had via the --show-data-sources / layer_coverage report.

Why call it "evidence"?¶

First, concretely: "evidence" is just the umbrella term for the sources of information in the table above. The artifact sources are the binary (L0), its debug info (L1), and its public headers (L2); the additional sources are the project's build-system data (L3 — compile flags, toolchain, target graph), its source tree (L4 — per-TU source ABI replay), and a source/build graph (L5 — include/type/call reachability). When the docs say "build/source evidence (L3/L4/L5)", that is exactly what they mean.

The umbrella word is a deliberate forensic metaphor, not decoration: abicheck treats "is this compatible?" as something it must prove from facts, the way a case is built from evidence, rather than as a single computation over one data source. Three properties of evidence are exactly the properties abicheck needs, and "tier" or "level" would imply the wrong ones:

Independent and partial. Each source contributes some facts and none is complete on its own — a binary shows symbols but not layout, headers show API but not what was actually built. Evidence is additive and overlaid, not a ranked ladder you fall back down. (Call them "tiers" and readers assume a fallback chain; they aren't one.)
Different authority. Just like physical vs. circumstantial evidence in a courtroom, not all of it carries equal weight. Artifact evidence (L0–L2) is what was actually built and shipped, so it is authoritative — only it can declare a binary BREAKING. Build/source evidence (L3/L4/L5) is corroborating — it explains, localizes, scopes, adds confidence, removes false positives, and can raise its own source-/API-level findings, but it can never overturn or silently delete an artifact-proven break. This is the authority rule (ADR-028).
Honest about what it had. Because the verdict is only as strong as the evidence behind it, every run reports the evidence it actually collected (the layer_coverage table and the "checks enabled… and why others are not" capability report). The output literally says "here is the evidence I had, so here is what I could and couldn't check."

So "evidence" + the authority rule is the mental model that lets abicheck keep adding sources for more accuracy without ever letting a weaker source override a proven break.

1. The detectability matrix¶

The most important table on this page. Read it as: given only this evidence, what can a checker conclude — and what is it structurally blind to?

Evidence available	Detects well	Cannot detect well
Exported symbol table only (stripped binary, no headers)	Removed/added exported symbols, symbol versions, visibility, SONAME/install-name, dependency (`DT_NEEDED`) changes	Struct layout, enum values, calling convention, source-only API changes, macro changes, inline/template body changes
Debug info (DWARF / PDB / BTF)	Type layout, field offsets, enum values, class sizes, vtables, calling convention, packing/alignment	Source-only API intent, macros, default arguments, some template/header-only changes
Headers / AST (CastXML / Clang)	Source signatures, overloads, default args, access/`final`/`explicit`/`noexcept`, templates visible in headers	Inline body semantics, macro expansion policy (unless modeled), runtime behavior
Source diff / compiler-based API extraction	Macros, inline function bodies, `constexpr` bodies, uninstantiated templates, source-level API	The binary layout actually emitted into a shipped library (unless paired with the binary/debug info)
Runtime app swap / integration test	Real loader/linker behavior and tested execution paths	Untested public API, future consumers, silent layout corruption (unless a test happens to expose it)
Bundle scan (multi-library)	Cross-DSO dependency / provider / entry-point problems	Pure source compatibility and semantic behavior not represented in artifacts or manifests

The first four rows are the artifact + source sources of §0 (L0/L1/L2 and the L4 source row); L3 build-context is a separate corroborating layer and is intentionally not a row here. The last two — runtime app swap and bundle scan — are orthogonal evidence axes, not extra rungs on the staircase.

Why abicheck combines layers¶

abicheck is strongest because it does not rely on a single row. It overlays the five independent, additive sources of §0 — plus the derived L5 graph — for six evidence layers in all (see Architecture and ADR-003 / ADR-028):

Layer	Source	Evidence it contributes
L0	Binary metadata	ELF symbols, SONAME, versioning, visibility, dependencies (and PE/COFF + Mach-O equivalents)
L1	Debug info (DWARF/PDB)	Layout, offsets, enum values, calling convention, vtable slots, type cross-checks
L2	Header AST (castxml or clang)	Function signatures, classes, structs, enums, typedefs, templates, `noexcept`, access, public/internal scoping (castxml also resolves vtables/layout; the clang backend is syntactic — pair it with L1/DWARF for layout)
L3	Build context	ABI-relevant flags, toolchain/sysroot, target graph, export-policy changes
L4	Source ABI replay	Macro/`constexpr` values, default-argument values, inline/template bodies, uninstantiated templates
L5	Source/build graph (derived)	Include/type/call reachability — localizes and explains findings, prioritizes cross-symbol impact (folded from L3, plus any L4 surface)

The best input you can give it is therefore:

old library + new library + matching public headers + debug info + build context — L0+L1+L2+L3 together.

With less, abicheck degrades gracefully down the staircase — a stripped binary with no headers collapses toward symbol-only checking, where layout and source-only breaks are invisible. See Recommendation: feed .so + debug info + headers.

2. Methods compared, by the evidence they use¶

Each method is good at what its evidence exposes and blind to the rest. None is a complete contract check on its own.

a. Build an app and swap the library¶

The most realistic consumer-level test — but not a complete contract check. It only exercises what one app imports and runs.

Strength	Example
Loader/linker failures	App fails because a required symbol is missing
Real runtime behavior	App crashes when it calls into changed ABI
Consumer-specific risk	App doesn't use the removed function, so this app still works
End-to-end deployment validation	RPATH/RUNPATH, search path, symbol versions all exercised

It misses	Why
Unused public APIs	The app only tests what it imports/executes
Silent data corruption	Tests may pass while layout is subtly wrong
Source compatibility	Binary may run, but recompiling may fail
Future consumers	One app is not the whole public contract
Header-only / source-only breaks	Existing binary doesn't exercise changed source

This maps to abicheck's appcompat command. See §4 for its exact scope.

b. libabigail (`abidiff`)¶

Primarily DWARF-based: abidw extracts ABI XML, abidiff compares it. Falls back to CTF/BTF or ELF symbol names; with no debug info it degrades toward ELF-only.

Good for: emitted binary ABI from debug builds (struct/class layout, type changes, symbols); no headers required in the common DWARF workflow; a mature ABI diff model.
Limits: stripped binaries degrade to symbol-only; a header directory is mostly a public-symbol filter, not a full source-AST analysis, so source-only changes (default args, access changes, noexcept) stay hard; not product/bundle/app-policy oriented by default.

c. ABI Compliance Checker (ABICC)¶

Two workflows:

abi-dumper workflow — DWARF-based dump from a debug .so, optional public-header filter. Lacks a full AST, so it misses many source-only API breaks.
XML / header workflow — GCC-compiled AST from headers. GCC-only, with known slowness/reliability issues, path sensitivity, and timeouts on complex C++. Lacks ELF binary metadata, so it's weaker on exported-symbol/platform linker facts.

Coarser verdict vocabulary than abicheck compare (no API_BREAK modeling). abicheck's compat mode is a drop-in replacement for ABICC-style flags; new integrations should prefer compare.

d. abicheck¶

The combined-evidence model above (§1). Strongest with library + headers + debug info + build context. See Tool Comparison for the benchmark showing why combining ELF + CastXML + DWARF beats single-source tools.

e. Methods beyond ABI diff tools¶

ABI diffing is one tool in a release-engineering kit. Complementary methods:

Method	What it adds
Downstream rebuilds	Detect source API breaks by recompiling real consumers
Runtime smoke / probe tests	Detect loader errors and common runtime failures
ABI/API snapshot baselines	Treat release snapshots as immutable contract records
Symbol-version script / export-map linting	Enforce the intended public/private boundary
Header/source API extraction	Catch macros, inline definitions, template surface
Fuzz / integration tests	Catch behavioral changes behind a stable ABI
Reverse-dependency CI	Ecosystem/distribution-wide validation
Security-hardening scanners	Catch non-ABI deployment regressions (RELRO/PIE/canary/FORTIFY)

The security-hardening check is the clean example of "not ABI, but still a release-compatibility risk": an ABI-compatible upgrade can weaken hardening while a normal ABI gate stays green. abicheck reports that as deployment risk, not an ABI break.

3. Traditional shared libraries vs header-only libraries¶

This distinction trips people up constantly, so it gets its own section.

Traditional `.so` / `.dll` / `.dylib`¶

There is a real binary contract to compare — exported symbols, symbol versions, dependency metadata, layout in debug info, public declarations in headers. abicheck's model is strongest here:

For compiled shared libraries, ABI compatibility is mainly about whether existing, already-built consumers can keep linking, loading, and calling into the new binary using the old contract.

Header-only libraries¶

A header-only library often has no exported library ABI — the code is compiled into each consumer. Compatibility is therefore mostly:

Compatibility type	Meaning
Source API compatibility	Will existing users recompile?
Generated ABI compatibility	Will rebuilt objects stay compatible with other objects?
Semantic compatibility	Does inline/`constexpr`/template behavior still mean the same thing?
Configuration compatibility	Do macros/features/flags produce the same public surface?

abicheck can still help in some cases:

Case	How abicheck helps
Header-only API also gates a shared-library boundary	Header-AST comparison catches some API changes
Explicit template instantiations shipped in a `.so`	The emitted instantiations can be checked
Header constants / default args / source signatures in the AST	Some source-level API breaks are found
App links a runtime helper library	App mode checks the app's imported symbols

But it cannot fully validate a pure header-only library: implicit header-only template instantiations are not in any shipped artifact (the documented mitigation is explicit instantiation of public templates that form part of the ABI — see Template Instantiation).

Header-only compatibility strategy

Use source API extraction, compile tests across supported compilers/standards, downstream rebuilds, and behavioral tests. Use abicheck for emitted artifacts, explicit template instantiations, or companion runtime libraries — not as the sole gate for header-only code.

4. App mode: consumer-scoped vs library-compare: contract-scoped¶

appcompat answers a deliberately narrow question: will this application still work with the new library? It parses the app's required symbols, compares old/new libraries in full mode, checks new-symbol availability, and filters findings to changes that matter to that app.

That scope cuts both ways:

App mode can say	App mode cannot say
"This app doesn't import the removed symbol."	"The library is generally ABI-compatible."
"This app needs symbol version X and the new lib lacks it."	"All future consumers are safe."
"This app is unaffected by this library-wide break."	"Header-only source users can recompile."
"This deployment path is OK for this app."	"No semantic behavior changed."

App mode is consumer-scoped compatibility. Library compare is product-contract compatibility. Use both: compare protects the library contract; appcompat protects a specific consumer deployment.

For header-only libraries, app mode is less central unless there's a companion runtime library — an existing app binary already contains the header-only code it compiled earlier, so swapping a library may not exercise the changed header-only implementation at all.

5. What ABI tools cannot prove¶

Even with perfect evidence, artifact comparison has hard boundaries. These are not abicheck's job — they need tests, specs, or source-AST tooling. Treat this as a guard against over-trusting any ABI tool (see Limitations for the authoritative list):

Case	Why it's invisible / out of scope
Macro-only changes	Macros are preprocessor behavior; not in the artifact
Inline function body changed, same signature	No exported ABI change; body is compiled into the consumer
`constexpr` behavior changed	Source/semantic compatibility, no symbol change
Template body changed but not instantiated	No emitted artifact to compare
Uninstantiated template signature change	Not in the shipped `.so` unless instantiated (`case122`)
Header-only change not affecting exports	There may be no shared-library ABI surface
Stripped binary, no headers/debug	Mostly symbol-level comparison only
Header/binary mismatch	The tool may analyze a contract the binary wasn't built with — false results
Static archives (`.a` / `.lib`) as archive containers	abicheck analyzes linkable images/shared libraries/objects, not archive containers (details)
Pure behavioral / semantic changes	Same ABI/API, different meaning — needs tests/spec review
Ownership / lifetime / thread-safety guarantee changes	A signature can be byte-identical while the contract it implements flips

The takeaway is the same one Part 0 opens with: a stable ABI is necessary but not sufficient for a compatible release. ABI tools prove the binary contract held; behavioral compatibility still needs your tests and your specification.