Part 1 — Foundations: From Source Code to a Running Process¶
Series navigation: 0. Product Contract · 1. Foundations · 2. Symbol Contracts · 3. Type Layout · 4. C++ ABI · 5. Linker & ELF · 6. Transitive Breaks · 7. Designing for Stability · 8. Detecting Breaks
What you'll learn on this page
- The journey a
.c/.cppfile takes to become a running process, and where compatibility is decided at each step. - What a symbol really is, and the difference between a symbol that is defined and one that is undefined (imported).
- The difference between static and dynamic linking, and why dynamic linking is where ABI compatibility matters.
- The two contracts every library publishes: the API (source-level) and the ABI (binary-level) — and why one break makes your compiler shout and the other one corrupts memory in silence.
- A mental model you can carry through the rest of the series: the compiler bakes the library's promises into the caller, and never re-checks them.
This page assumes no prior knowledge of linkers or loaders. If you already
know what .dynsym, COPY relocations, and the dynamic loader are, skip ahead to
Part 2 — Symbol Contracts.
Mental model: ELF/Linux unless stated
This series teaches with the ELF/Linux model (symbols resolved by name, SONAME, version scripts) because it's the cleanest to reason about. Windows PE/COFF and macOS Mach-O add their own mechanisms — ordinal exports, import libraries, install names, compatibility versions, two-level namespaces, and platform-specific C++ ABIs. Where it matters the text calls it out; the consolidated map is in Part 5 §PE/COFF and Mach-O parallels and the Platform Support reference.
1. The build pipeline: where does a library come from?¶
When you run gcc foo.c -o foo, a single command hides four distinct stages.
Each stage hands a different representation of your program to the next:
flowchart LR
A["foo.c<br/>(source text)"] -->|preprocess| B["foo.i<br/>(expanded source)"]
B -->|compile| C["foo.s<br/>(assembly)"]
C -->|assemble| D["foo.o<br/>(object file<br/>+ symbol table)"]
D -->|link| E["a.out / libfoo.so<br/>(executable / shared object)"]
E -->|load + run| F["running process"]
- Preprocess.
#includedirectives are pasted in, macros expanded. This is the only stage that sees your headers — the textual interface a library publishes. Headers define the API. - Compile. The preprocessed source becomes assembly for one specific
target (e.g. x86-64). Crucially, this is where the compiler turns the
library's promises into machine code. If a header says
struct Point { int x; int y; }, the compiler now knowssizeof(Point)is 8 and the fieldylives at byte offset 4 — and it bakes those numbers directly into every instruction that touches aPoint. - Assemble. Assembly becomes an object file (
.o): machine code plus a symbol table listing the names this file provides and the names it needs. - Link. The linker stitches object files (and libraries) together, resolving each "needed" name to a "provided" one, producing an executable or a shared library.
The single most important idea in this entire series lives in stage 2:
The compiler bakes the library's ABI facts into the caller and never re-checks them. Offsets, sizes, register choices, and vtable slot numbers become immediate constants inside the caller's machine code. When the library later changes one of those facts, the caller keeps using the old number. Nobody re-validates it. That is why an ABI break is silent.
2. What is a symbol?¶
A symbol is a name with an address attached — the unit the linker and loader trade in. Functions and global variables become symbols; local variables inside a function do not.
Compile this file:
// math.c
int add(int a, int b) { return a + b; } // defined here
int counter; // defined here (global)
extern int log_event(int code); // declared, NOT defined here
int tally(int x) {
counter += x;
return log_event(add(x, 1)); // needs add (local) + log_event (external)
}
Inspect the resulting symbol table:
$ gcc -c math.c -o math.o
$ nm math.o
0000000000000000 T add # T = defined, in .text (code)
0000000000000004 C counter # C = common/global data symbol
U log_event # U = UNDEFINED — this file needs it from elsewhere
0000000000000014 T tally
Two categories matter for the rest of the series:
| Kind | nm letter |
Meaning |
|---|---|---|
| Defined (exported) | T, D, B, W… |
"I provide this name; here is its code/data." |
| Undefined (imported) | U |
"I use this name; somebody else must provide it." |
Linking is, at its heart, matching every U to a T/D somewhere. If a
single U has no match, the link (or the load) fails. The name is the only
key used for matching — not the function's parameter types, not the variable's
size. Hold onto that: it is the root cause of an entire family of breaks in
Part 2.
Names in C vs C++ — mangling¶
In C, the symbol name is the source name: add stays add. In C++, the
compiler mangles the name to encode the namespace, class, and parameter
types, so that int add(int,int) and double add(double) can coexist:
$ echo 'namespace geo { int add(int,int){return 0;} }' | g++ -x c++ -c - -o t.o
$ nm t.o | c++filt
0000000000000000 T geo::add(int, int) # mangled symbol: _ZN3geo3addEii
Mangling means that in C++ a parameter type, a const qualifier, or a
namespace change rewrites the symbol name — turning a source-level edit into a
linker-level removal. We return to this repeatedly in Part 4.
3. Static vs dynamic linking¶
There are two ways the linker can satisfy your program's undefined symbols.
Static linking copies the needed code into your executable at build time.
The library's bytes become part of a.out. Once built, there is no further
dependency — and no further compatibility question, because nothing can change
underneath you.
Dynamic linking leaves the undefined symbols unresolved in your
executable and records a note that says "find these in libfoo.so at startup."
The resolution happens every time the program runs.
flowchart TB
subgraph Static
app1["app (contains a copy of libfoo's code)"]
end
subgraph Dynamic
app2["app<br/>NEEDED: libfoo.so.1<br/>U: foo, bar"] -.->|resolved at load time| lib["libfoo.so.1<br/>T: foo, bar"]
end
Dynamic linking is the default on Linux, macOS, and Windows because it saves
memory (one copy of libc shared by every process), allows security fixes
without rebuilding every consumer, and enables plugins. But it creates a
contract that outlives the build: the executable was compiled against
today's libfoo.so, yet it will be resolved against whatever libfoo.so is
installed when it runs — possibly years later, possibly a different version.
ABI compatibility is the promise that a future
libfoo.socan still satisfy a binary built against an older one — without recompiling the binary.
The rest of this series is a catalog of the ways that promise gets broken.
4. The dynamic loader: what happens at startup¶
When you launch a dynamically-linked program, a special component — the
dynamic loader (ld.so on Linux, dyld on macOS, the PE loader on
Windows) — runs before your main(). Its job:
- Read the executable's list of needed libraries (
DT_NEEDEDentries on ELF). - Find and
mmapeach library into the process's address space. - Walk the executable's relocations — the list of "patch this address once
you know where the symbol landed" notes — and resolve each undefined symbol
by looking up its name in each library's dynamic symbol table
(
.dynsym). - If any name can't be found:
symbol lookup errorand the process dies beforemain().
This lookup is by name only. The loader does not know or check that helper
used to take two ints and now takes a double — it only checks that a
symbol called helper exists. That is both the strength of dynamic linking
(loose coupling) and the source of its most dangerous failure mode (silent
mismatch). A symbol that resolves successfully but means something different now
is the textbook silent ABI break.
Lazy vs immediate binding. By default, function symbols are resolved lazily — on first call, via a trampoline (the PLT). With
LD_BIND_NOW=1or-Wl,-z,now, everything resolves at startup. This only changes when a missing-symbol error surfaces, not whether it does.
5. Two contracts: API vs ABI¶
We can now state the central distinction precisely.
An API (Application Programming Interface) is the source-level contract: the declarations a consumer's source code compiles against — function signatures, type definitions, macros, templates, and the semantic guarantees that go with them. The API lives in the headers. You experience an API break at compile time: the build fails, the error points at a line, you fix it.
An ABI (Application Binary Interface) is the binary-level contract
between already-compiled artifacts: the exact byte layout of types, the symbol
names and their mangling, the calling convention (which register holds which
argument), vtable shapes, exception-unwinding metadata, and the relocation rules
the loader relies on. The ABI lives in the compiled .so/.dll/.dylib and
the binaries built against it. You experience an ABI break at run time —
or worse, you don't experience it, because it corrupts memory quietly.
| API break | ABI break | |
|---|---|---|
| Contract level | Source (headers) | Binary (compiled code) |
| Who notices | The compiler | Nobody — until it crashes or corrupts |
| When | Build time | Run time (or silently, never) |
| Fix | Edit + recompile consumer | Often impossible without rebuilding every consumer |
| Example | Renaming an enum member | Inserting a struct field |
The asymmetry is the whole point:
An API break forces downstream code to be edited; an ABI break does not — but it silently corrupts memory, misroutes calls, or fails to resolve symbols at load time, because the consumer binary was produced under assumptions the new library no longer satisfies.
A change can be one, the other, both, or neither:
- API break only — rename a function in the header but keep the old exported symbol via an alias. Old binaries still link; new source won't compile against the old name.
- ABI break only — add a field to a struct. Source that uses the struct
still compiles fine; old binaries that baked in the old
sizeofcorrupt memory. - Both — remove a function entirely.
- Neither — change a comment, or an implementation detail behind an opaque pointer.
6. Why ABI breaks are expensive¶
The cost of an ABI break compounds with the size of the ecosystem depending on
the library. When libfoo.so.1 breaks ABI without changing its identity, the
damage radiates:
- Linux distributions must rebuild — and re-test, re-sign, and re-ship — every reverse-dependency in the archive. Debian and Fedora each track hundreds of such "library transitions" per release; a single unannounced ABI break can stall an entire distribution's release.
- Embedded / firmware: an ABI break shipped in an over-the-air update can brick devices in the field, when a pre-linked application loads a new system library whose struct offsets have shifted.
- Plugin ecosystems — audio hosts loading VST modules, game engines loading mods, browsers loading components, IDEs loading extensions — fracture entirely when the host's ABI changes. Third-party binaries shipped years earlier fault on first call, and the plugin author may no longer exist to rebuild them.
This is why mature libraries treat ABI as a versioned, gated, deliberately managed surface — and why a tool that can tell you whether a change is ABI-breaking before you ship belongs in your CI pipeline.
7. Where abicheck fits¶
abicheck operationalizes everything above. It does not need your source at
review time; it reads the compiled artifacts (plus debug info / headers when
available) and reasons about the binary contract directly.
flowchart LR
old["libfoo.so.1<br/>(old release)"] -->|dump| s1["snapshot v1"]
new["libfoo.so.2-candidate"] -->|dump| s2["snapshot v2"]
s1 --> diff{{"diff + classify"}}
s2 --> diff
diff --> verdict["verdict:<br/>NO_CHANGE / COMPATIBLE /<br/>COMPATIBLE_WITH_RISK /<br/>API_BREAK / BREAKING"]
- Snapshot. It extracts an
AbiSnapshotfrom each binary — the exported symbols, type layouts, vtables, calling conventions, ELF metadata, and so on. - Diff. It compares the two snapshots structurally (walking pointer chains, struct members, vtable slots — not just symbol names).
- Classify. Each detected difference is one of 254 change kinds, and each change kind is partitioned into exactly one verdict tier.
The five verdicts, from safest to most severe, are:
| Verdict | Meaning | Typical CI action |
|---|---|---|
✅ NO_CHANGE |
Identical ABI/API | pass |
🟢 COMPATIBLE |
Backward-compatible (additions / quality signals) | pass |
🟡 COMPATIBLE_WITH_RISK |
Links fine, but a deployment/behavioral hazard | review |
🟠 API_BREAK |
Source won't recompile; binaries still link | review / bump minor |
🔴 BREAKING |
Existing binaries crash or corrupt at runtime | block / bump SONAME |
Each verdict maps to a CI exit code so a release gate can distinguish a harmless symbol addition from a silent memory-corruption hazard. The full semantics and exit codes are in Verdicts; a one-screen summary is the ABI Cheat Sheet.
Throughout the rest of the series, look for callouts like this:
How abicheck sees it
Inline boxes like this name the specific change kind and verdict abicheck reports for the scenario being discussed, and how it detects it (ELF symbol table, DWARF debug info, or headers). They tie the general mechanism back to something you can run in CI.
8. Glossary¶
| Term | One-line definition |
|---|---|
| Symbol | A name (function or global) the linker/loader resolves to an address. |
| Defined / Undefined symbol | A name this object provides vs needs (T vs U in nm). |
.dynsym |
A shared object's table of dynamically-visible symbols, searched at load time. |
| Relocation | A "patch this address once the symbol's location is known" note in a binary. |
| Dynamic loader | ld.so / dyld / PE loader — resolves symbols and maps libraries at startup. |
| Mangling | Encoding of C++ name + signature into a flat symbol string. |
| SONAME | A shared object's declared identity (libfoo.so.1); the major number is the ABI epoch. |
| Calling convention | The register/stack contract for passing arguments and returning values. |
| Vtable | A per-class array of function pointers backing C++ virtual dispatch. |
| API | Source-level contract (headers). Breaks at compile time. |
| ABI | Binary-level contract (compiled artifacts). Breaks at run time, often silently. |
Next¶
Now that you know what a symbol is and how it's resolved, the most direct way to break a library is to make a symbol the loader needs disappear or change meaning. That is the subject of the next page.
➡️ Part 2 — Symbol Contract Breaks
See also: ABI Cheat Sheet · Verdicts · Examples Encyclopedia