Skip to content

Evidence & Detectability: What Each Method Can and Cannot See

One idea drives this whole page: different methods observe different evidence, and no single method detects every compatibility issue. A tool can only report what its inputs let it see. Feed it symbols only and it sees symbol changes; feed it debug info and it sees layout; feed it headers and it sees source-level API. Some changes (#define macros, inline/template bodies, uninstantiated templates) are invisible to any artifact comparison.

This page is the conceptual companion to the practical Limitations and Tool Comparison pages; for the teaching-track version — which break families need which evidence, with worked example cases — see Part 8 of the learning series. It answers the question users ask most often:

"Why did tool A catch this and tool B didn't?"

Almost always, the answer is evidence: the two tools were looking at different inputs.


0. The five sources of information

A release engineer can hand a compatibility checker up to five different sources of information about a library, ordered from the least to the most. Each one adds facts the previous cannot see; none of them is complete on its own. abicheck names them with the layer codes L0L4. A sixth layer, L5, is not something you hand over — it is a source/build graph abicheck derives from L3 (and any L4 surface) to localize and explain findings. So the full model is six evidence layers, L0L5 (matching Build Info & Sources), of which the five L0L4 are inputs you provide and L5 is derived. This section covers the five you provide; the derived L5 layer is detailed below and in Build Info & Sources. You can see which artifact layers (L0L2) a given input exposes with abicheck dump --show-data-sources; the build/source layers (L3/L4) are not reported there — they surface in the pack-aware compare layer_coverage table once you supply a build/source pack:

# Source you provide Layer abicheck input What it newly reveals
1 Just the binary L0 a stripped .so/.dll/.dylib Exported symbols, SONAME/install-name, symbol versions, visibility, binding, DT_NEEDED/LC_LOAD_DYLIB dependencies
2 + Debug symbols L1 a -g build (DWARF/PDB) or sidecar debug file Type layout: struct/class sizes, field offsets, enum values, vtable slots, calling convention, packing/alignment
3 + Public headers L2 -H include/ (parsed by castxml or clang — --ast-frontend) Source-level API: signatures, overloads, access (public/private), final/explicit/noexcept, templates, declared default args, public/internal scoping
4 + Build system data & options L3 -p build/ (compile DB, CMake/Ninja/Bazel/Make) The flags the library was actually built with: -std, _GLIBCXX_USE_CXX11_ABI, -fvisibility, -fabi-version, toolchain/sysroot, target graph, export maps
5 + Sources L4 a build/source pack (per-TU source ABI replay, ADR-030) Facts that never reach the binary: macro constants, constexpr values, default-argument values, inline/template bodies, uninstantiated templates

Read this as a staircase: each step up the table can both find breaks the step below is blind to and prevent false positives the step below would raise. A struct-field insertion is invisible at L0 but obvious at L1 (case07); an internal-struct change that looks like a break at L1 is correctly dismissed once L2 headers reveal the struct is non-public (case118).

The derived sixth layer, L5. Beyond the five sources above, abicheck derives an L5 source/build graph (include/type/call reachability, ADR-031) from L3 (and any L4 surface) to localize and explain findings and prioritize cross-symbol impact. It is covered with the other build/source layers in Build Info & Sources.

Layers (L) vs. scan levels (S). The L0L5 codes name evidence layerswhat abicheck sees and how much that evidence is trusted. The abicheck scan command has a separate s0s6 axis naming the method used to gather the L3–L5 evidence — a different meaning of the word "level". Scan Levels (S vs L) explains both axes and how they map onto each other.

How they combine

The layers are independent and additive, not a fallback chain — abicheck overlays every source you give it and lets the strongest evidence win, under one rule (the authority rule, see Build & Source Packs):

Artifact-backed evidence (L0/L1/L2) is authoritative for the shipped-ABI verdict. Build/source evidence (L3/L4) explains, localizes, scopes, or adds confidence to a finding, and can raise source-/API-level findings of its own — but it never silently deletes an artifact-proven break.

Concretely: L0 says a symbol changed; L1 says its layout changed by N bytes; L2 says and the public declaration that names it changed too; L3 says and it was built with a different -std, so expect churn; L4 says and the macro it expands actually changed value. The verdict is computed worst-wins across all of them. The design of how the layers are collected and reconciled is in Architecture; the per-case evidence each example needs is benchmarked in Tool Comparison §Benchmarking by evidence tier.

Best input you can give abicheck: old + new library, matching public headers, debug info, and the build's compile database — L0+L1+L2+L3 together. With less, abicheck degrades down the staircase and tells you exactly which layers it had via the --show-data-sources / layer_coverage report.

Why call it "evidence"?

First, concretely: "evidence" is just the umbrella term for the sources of information in the table above. The artifact sources are the binary (L0), its debug info (L1), and its public headers (L2); the additional sources are the project's build-system data (L3 — compile flags, toolchain, target graph), its source tree (L4 — per-TU source ABI replay), and a source/build graph (L5 — include/type/call reachability). When the docs say "build/source evidence (L3/L4/L5)", that is exactly what they mean.

The umbrella word is a deliberate forensic metaphor, not decoration: abicheck treats "is this compatible?" as something it must prove from facts, the way a case is built from evidence, rather than as a single computation over one data source. Three properties of evidence are exactly the properties abicheck needs, and "tier" or "level" would imply the wrong ones:

  • Independent and partial. Each source contributes some facts and none is complete on its own — a binary shows symbols but not layout, headers show API but not what was actually built. Evidence is additive and overlaid, not a ranked ladder you fall back down. (Call them "tiers" and readers assume a fallback chain; they aren't one.)
  • Different authority. Just like physical vs. circumstantial evidence in a courtroom, not all of it carries equal weight. Artifact evidence (L0–L2) is what was actually built and shipped, so it is authoritative — only it can declare a binary BREAKING. Build/source evidence (L3/L4/L5) is corroborating — it explains, localizes, scopes, adds confidence, removes false positives, and can raise its own source-/API-level findings, but it can never overturn or silently delete an artifact-proven break. This is the authority rule (ADR-028).
  • Honest about what it had. Because the verdict is only as strong as the evidence behind it, every run reports the evidence it actually collected (the layer_coverage table and the "checks enabled… and why others are not" capability report). The output literally says "here is the evidence I had, so here is what I could and couldn't check."

So "evidence" + the authority rule is the mental model that lets abicheck keep adding sources for more accuracy without ever letting a weaker source override a proven break.


1. The detectability matrix

The most important table on this page. Read it as: given only this evidence, what can a checker conclude — and what is it structurally blind to?

Evidence available Detects well Cannot detect well
Exported symbol table only (stripped binary, no headers) Removed/added exported symbols, symbol versions, visibility, SONAME/install-name, dependency (DT_NEEDED) changes Struct layout, enum values, calling convention, source-only API changes, macro changes, inline/template body changes
Debug info (DWARF / PDB / BTF) Type layout, field offsets, enum values, class sizes, vtables, calling convention, packing/alignment Source-only API intent, macros, default arguments, some template/header-only changes
Headers / AST (CastXML / Clang) Source signatures, overloads, default args, access/final/explicit/noexcept, templates visible in headers Inline body semantics, macro expansion policy (unless modeled), runtime behavior
Source diff / compiler-based API extraction Macros, inline function bodies, constexpr bodies, uninstantiated templates, source-level API The binary layout actually emitted into a shipped library (unless paired with the binary/debug info)
Runtime app swap / integration test Real loader/linker behavior and tested execution paths Untested public API, future consumers, silent layout corruption (unless a test happens to expose it)
Bundle scan (multi-library) Cross-DSO dependency / provider / entry-point problems Pure source compatibility and semantic behavior not represented in artifacts or manifests

The first four rows are the artifact + source sources of §0 (L0/L1/L2 and the L4 source row); L3 build-context is a separate corroborating layer and is intentionally not a row here. The last two — runtime app swap and bundle scan — are orthogonal evidence axes, not extra rungs on the staircase.

Why abicheck combines layers

abicheck is strongest because it does not rely on a single row. It overlays the five independent, additive sources of §0 — plus the derived L5 graph — for six evidence layers in all (see Architecture and ADR-003 / ADR-028):

Layer Source Evidence it contributes
L0 Binary metadata ELF symbols, SONAME, versioning, visibility, dependencies (and PE/COFF + Mach-O equivalents)
L1 Debug info (DWARF/PDB) Layout, offsets, enum values, calling convention, vtable slots, type cross-checks
L2 Header AST (castxml or clang) Function signatures, classes, structs, enums, typedefs, templates, noexcept, access, public/internal scoping (castxml also resolves vtables/layout; the clang backend is syntactic — pair it with L1/DWARF for layout)
L3 Build context ABI-relevant flags, toolchain/sysroot, target graph, export-policy changes
L4 Source ABI replay Macro/constexpr values, default-argument values, inline/template bodies, uninstantiated templates
L5 Source/build graph (derived) Include/type/call reachability — localizes and explains findings, prioritizes cross-symbol impact (folded from L3, plus any L4 surface)

The best input you can give it is therefore:

old library + new library + matching public headers + debug info + build context — L0+L1+L2+L3 together.

With less, abicheck degrades gracefully down the staircase — a stripped binary with no headers collapses toward symbol-only checking, where layout and source-only breaks are invisible. See Recommendation: feed .so + debug info + headers.


2. Methods compared, by the evidence they use

Each method is good at what its evidence exposes and blind to the rest. None is a complete contract check on its own.

a. Build an app and swap the library

The most realistic consumer-level test — but not a complete contract check. It only exercises what one app imports and runs.

Strength Example
Loader/linker failures App fails because a required symbol is missing
Real runtime behavior App crashes when it calls into changed ABI
Consumer-specific risk App doesn't use the removed function, so this app still works
End-to-end deployment validation RPATH/RUNPATH, search path, symbol versions all exercised
It misses Why
Unused public APIs The app only tests what it imports/executes
Silent data corruption Tests may pass while layout is subtly wrong
Source compatibility Binary may run, but recompiling may fail
Future consumers One app is not the whole public contract
Header-only / source-only breaks Existing binary doesn't exercise changed source

This maps to abicheck's appcompat command. See §4 for its exact scope.

b. libabigail (abidiff)

Primarily DWARF-based: abidw extracts ABI XML, abidiff compares it. Falls back to CTF/BTF or ELF symbol names; with no debug info it degrades toward ELF-only.

  • Good for: emitted binary ABI from debug builds (struct/class layout, type changes, symbols); no headers required in the common DWARF workflow; a mature ABI diff model.
  • Limits: stripped binaries degrade to symbol-only; a header directory is mostly a public-symbol filter, not a full source-AST analysis, so source-only changes (default args, access changes, noexcept) stay hard; not product/bundle/app-policy oriented by default.

c. ABI Compliance Checker (ABICC)

Two workflows:

  • abi-dumper workflow — DWARF-based dump from a debug .so, optional public-header filter. Lacks a full AST, so it misses many source-only API breaks.
  • XML / header workflow — GCC-compiled AST from headers. GCC-only, with known slowness/reliability issues, path sensitivity, and timeouts on complex C++. Lacks ELF binary metadata, so it's weaker on exported-symbol/platform linker facts.

Coarser verdict vocabulary than abicheck compare (no API_BREAK modeling). abicheck's compat mode is a drop-in replacement for ABICC-style flags; new integrations should prefer compare.

d. abicheck

The combined-evidence model above (§1). Strongest with library + headers + debug info + build context. See Tool Comparison for the benchmark showing why combining ELF + CastXML + DWARF beats single-source tools.

e. Methods beyond ABI diff tools

ABI diffing is one tool in a release-engineering kit. Complementary methods:

Method What it adds
Downstream rebuilds Detect source API breaks by recompiling real consumers
Runtime smoke / probe tests Detect loader errors and common runtime failures
ABI/API snapshot baselines Treat release snapshots as immutable contract records
Symbol-version script / export-map linting Enforce the intended public/private boundary
Header/source API extraction Catch macros, inline definitions, template surface
Fuzz / integration tests Catch behavioral changes behind a stable ABI
Reverse-dependency CI Ecosystem/distribution-wide validation
Security-hardening scanners Catch non-ABI deployment regressions (RELRO/PIE/canary/FORTIFY)

The security-hardening check is the clean example of "not ABI, but still a release-compatibility risk": an ABI-compatible upgrade can weaken hardening while a normal ABI gate stays green. abicheck reports that as deployment risk, not an ABI break.


3. Traditional shared libraries vs header-only libraries

This distinction trips people up constantly, so it gets its own section.

Traditional .so / .dll / .dylib

There is a real binary contract to compare — exported symbols, symbol versions, dependency metadata, layout in debug info, public declarations in headers. abicheck's model is strongest here:

For compiled shared libraries, ABI compatibility is mainly about whether existing, already-built consumers can keep linking, loading, and calling into the new binary using the old contract.

Header-only libraries

A header-only library often has no exported library ABI — the code is compiled into each consumer. Compatibility is therefore mostly:

Compatibility type Meaning
Source API compatibility Will existing users recompile?
Generated ABI compatibility Will rebuilt objects stay compatible with other objects?
Semantic compatibility Does inline/constexpr/template behavior still mean the same thing?
Configuration compatibility Do macros/features/flags produce the same public surface?

abicheck can still help in some cases:

Case How abicheck helps
Header-only API also gates a shared-library boundary Header-AST comparison catches some API changes
Explicit template instantiations shipped in a .so The emitted instantiations can be checked
Header constants / default args / source signatures in the AST Some source-level API breaks are found
App links a runtime helper library App mode checks the app's imported symbols

But it cannot fully validate a pure header-only library: implicit header-only template instantiations are not in any shipped artifact (the documented mitigation is explicit instantiation of public templates that form part of the ABI — see Template Instantiation).

Header-only compatibility strategy

Use source API extraction, compile tests across supported compilers/standards, downstream rebuilds, and behavioral tests. Use abicheck for emitted artifacts, explicit template instantiations, or companion runtime libraries — not as the sole gate for header-only code.


4. App mode: consumer-scoped vs library-compare: contract-scoped

appcompat answers a deliberately narrow question: will this application still work with the new library? It parses the app's required symbols, compares old/new libraries in full mode, checks new-symbol availability, and filters findings to changes that matter to that app.

That scope cuts both ways:

App mode can say App mode cannot say
"This app doesn't import the removed symbol." "The library is generally ABI-compatible."
"This app needs symbol version X and the new lib lacks it." "All future consumers are safe."
"This app is unaffected by this library-wide break." "Header-only source users can recompile."
"This deployment path is OK for this app." "No semantic behavior changed."

App mode is consumer-scoped compatibility. Library compare is product-contract compatibility. Use both: compare protects the library contract; appcompat protects a specific consumer deployment.

For header-only libraries, app mode is less central unless there's a companion runtime library — an existing app binary already contains the header-only code it compiled earlier, so swapping a library may not exercise the changed header-only implementation at all.


5. What ABI tools cannot prove

Even with perfect evidence, artifact comparison has hard boundaries. These are not abicheck's job — they need tests, specs, or source-AST tooling. Treat this as a guard against over-trusting any ABI tool (see Limitations for the authoritative list):

Case Why it's invisible / out of scope
Macro-only changes Macros are preprocessor behavior; not in the artifact
Inline function body changed, same signature No exported ABI change; body is compiled into the consumer
constexpr behavior changed Source/semantic compatibility, no symbol change
Template body changed but not instantiated No emitted artifact to compare
Uninstantiated template signature change Not in the shipped .so unless instantiated (case122)
Header-only change not affecting exports There may be no shared-library ABI surface
Stripped binary, no headers/debug Mostly symbol-level comparison only
Header/binary mismatch The tool may analyze a contract the binary wasn't built with — false results
Static archives (.a / .lib) as archive containers abicheck analyzes linkable images/shared libraries/objects, not archive containers (details)
Pure behavioral / semantic changes Same ABI/API, different meaning — needs tests/spec review
Ownership / lifetime / thread-safety guarantee changes A signature can be byte-identical while the contract it implements flips

The takeaway is the same one Part 0 opens with: a stable ABI is necessary but not sufficient for a compatible release. ABI tools prove the binary contract held; behavioral compatibility still needs your tests and your specification.


See also: Part 0 — Compatibility as a Product Contract · Limitations · Tool Comparison · Application Compatibility · Multi-Binary Releases.