G22 — CLI consolidation & interface-contract enforcement¶
Type: Refactor / UX plan. Implements ADR-037.
Tracked by usecase-registry.yaml entry UC-WF-cli-contract (gap G22).
Effort: L (touches every cli_*.py, service.py, mcp_server.py,
cli_options.py, plus a new CI gate) · Risk: medium — behaviour-preserving
for the common path, but folds commands and moves settings to config.
Builds on: G21 (depth dial, one-shot deep-compare), ADR-035 (typed
requests, .abicheck.yml), ADR-036 (report view-model).
Problem¶
The CLI exposes the internal pipeline (~394 options / 31 commands). Five
verdict-emitting commands differ only by operand yet re-declare option families
inline; compare-release bypasses service.run_compare and classifies with a
different scope_public default; the evidence dial has three vocabularies;
exit-code meaning is inferred from flag presence. See ADR-037 §Context for the
full audit.
Goal & acceptance criteria¶
- Every verdict-emitting front-end command routes through a
service.run_*verb. AC: newcli-contractCI check (ADR-037 D10.1) passes; nocli*.pycallschecker.compare/diff_*orchestration directly (type imports for annotations/rendering remain legal). - Shared option families exist once as decorators. AC: D10.2 coverage check
passes;
INTENTIONAL_SUBSETallowlist holds every deliberate exception with a reason. - One depth vocabulary. AC:
--collect-mode/--mode/--source-methodresolve as deprecated aliases of--depth;--helpshows one dial;graphis no longer a user-facing depth (D6). compare-releaseanddeep-compareare deprecated aliases;compareaccepts dir/package inputs and--max. AC: existingcompare-release/deep-comparetests pass against the alias; new tests covercompare <dir> <dir>.--header-backend→--ast-frontend(header AST + source-ABI), old name aliased. AC: alias test;--ast-frontend androidwithout--sourceserrors.- Per-category severity, FP-tuning, suppression hygiene, and precise S-axis
move to
.abicheck.yml; CLI keeps the coarse overrides only. AC: config round-trips; CLI flag override beats config; per-command flag count under budget (D10.5). - One explicit exit-code scheme (
--exit-code-scheme, defaultauto). AC: passing--severity-*no longer silently changes the scheme. - Validation is fail-fast and lives in Tier 2. AC: identical error text from CLI and MCP for the same bad request.
- MCP params and CLI flags share one name map. AC: D10.3 completeness test.
Non-goals: changing detection/classification logic; the ABICC compat dialect
(it intentionally mirrors abi-compliance-checker spelling — it routes through
Tier 2 but keeps its own flags).
Files & surfaces¶
What each module gains/loses. "T2" = Tier 2 (service), "FE" = front-end.
| Module | Change | Phase |
|---|---|---|
abicheck/api_types.py (new) |
InputSpec, CompareRequest (+validate()), OutputSpec, re-export AnalysisDepth |
1 |
abicheck/service.py (T2) |
run_compare(req: CompareRequest); old-kwargs shim; resolve_input already exists |
1 |
abicheck/cli_options.py |
the 7 decorators (D3); INTENTIONAL_SUBSET, DEPRECATED_FLAGS, MCP_CLI_NAME_MAP tables |
2,3,6,7 |
abicheck/cli.py (FE) |
compare recomposed onto decorators; input-type dispatch; --exit-code-scheme |
2,4,5 |
abicheck/cli_compare_release.py (FE) |
_run_compare_pair → service.run_compare; command → deprecated alias |
1,4 |
abicheck/cli_max.py (FE) |
deep-compare → deprecated alias of compare --max |
4 |
abicheck/cli_appcompat.py (FE) |
route through service.run_compare; adopt shared decorators |
1,2 |
abicheck/cli_scan.py (FE) |
--mode/--source-method → aliases of --depth |
3 |
abicheck/mcp_server.py (FE) |
params aligned to MCP_CLI_NAME_MAP; build CompareRequest |
6 |
abicheck/dumper.py / dump cmd |
--header-backend → --ast-frontend; @evidence_options |
6,3 |
abicheck/buildsource/inline.py (load_build_config/BuildConfig) |
new .abicheck.yml blocks: severity, scope, suppression, source, exit_code_scheme, version — wired into policy_file.py/severity.py/suppression.py loaders |
5,7 |
scripts/check_ai_readiness.py |
cli-contract check (D10.1–5) |
1,2,5 |
tests/test_cli_contract.py (new) |
the gate's unit-test mirror + option-count snapshot | 1,2,5 |
Phases¶
Each phase = one PR. Sub-structure: Work · Tests · Risk & rollback · Done-when.
Phase 1 — Typed requests + single chokepoint (D1, D2)¶
Status: landed. abicheck/api_types.py ships InputSpec/CompareRequest
(validate())/OutputSpec; service.run_compare is a kwargs shim over
run_compare_request(CompareRequest), and the new Tier-2 service.compare_snapshots
wraps checker.compare so every front-end (compare, compare-release,
scan, appcompat, and the MCP server abi_compare) routes through Tier 2 — no
front-end calls checker.compare directly. Enforced by the cli-contract
AI-readiness check (D10.1), which now scans mcp_server.py alongside every
cli*.py and appcompat.py, and mirrored in tests/test_cli_contract.py;
tests/test_api_types.py covers the request struct. The remaining phases (2–7)
are still open.
Work. Add api_types.py (InputSpec, CompareRequest+validate(),
OutputSpec); struct fields via field(default_factory=...) (a frozen
dataclass still shares one import-time default otherwise). Refactor
service.run_compare to take a CompareRequest; keep a thin **kwargs shim
that builds the request internally so existing callers compile during the phase.
Re-point cli_compare_release._run_compare_pair and cli_appcompat at
service.run_compare (this alone kills the scope_public default drift). Add
the cli-contract D10.1 check (no Tier-1 call sites in cli*.py; type
imports stay legal).
Tests. tests/test_cli_contract.py::test_no_tier_skip; a parity test
asserting compare and compare-release now yield the same DiffResult for
one pair (the drift regression); CompareRequest round-trip + frozen/default
unit tests.
Risk & rollback. Low — internal refactor, no user-visible flag change. The kwargs shim means a partial landing still runs; rollback = revert the FE re-pointing, keep the dataclass.
Done-when. Drift parity test green; cli-contract D10.1 active; no behaviour
change observable from the CLI.
Phase 2 — Decorator-ize shared families (D3)¶
Status: landed. cli_options.py now defines six shared families as
decorators — two_sided_input_options, policy_options, severity_options,
scope_options, debug_resolution_options, and the output_options factory
(per-command --format choices vary legitimately) — plus the contract tables
(FAMILY_FLAGS, FAMILY_DECORATOR, REQUIRED_FAMILIES,
VERDICT_EMITTING_COMMANDS, INTENTIONAL_SUBSET, DEFERRED_MULTI_DEFAULT).
compare, compare-release, appcompat, and deep-compare are recomposed onto
them with their inline duplicates deleted (behaviour-preserving; help text is now
uniform, and the divergences are closed — deep-compare's -H no longer needs
the file to exist at parse, matching its siblings). deep-compare keeps only the
coarse --severity-preset and is the sole INTENTIONAL_SUBSET entry. The
cli-contract gate gained D10.2 (decorator coverage) and D10.4
(one-default-per-flag), mirrored in tests/test_cli_contract.py along with the
per-command option-set snapshot and the gate↔cli_options table-mirror test.
The seventh decorator, @evidence_options (--depth/--max/per-side
--old/new-sources/--old/new-build-info, plus the hidden --collect-mode
alias), now exists as the canonical two-sided evidence family (the pre-rename
build_source_compare_options is kept as a back-compat alias). It is registered
in the contract tables as a non-required family (FAMILY_FLAGS["evidence"] /
FAMILY_DECORATOR["evidence"]) because only commands that take source depth
(compare) compose it. dump stays single-sided on the sibling
build_source_dump_options (it adds the build-query knobs and has no per-side
packs), so the two are deliberately not merged into one decorator.
Original plan. Define the 7 decorators in cli_options.py (ADR-037 D3 table).
Recompose compare, compare-release, appcompat, deep-compare, dump,
scan onto them, deleting inline duplicates. Seed INTENTIONAL_SUBSET with any
deliberate omission (each with a reason string). Add D10.2 (decorator coverage)
+ D10.4 (one-default-per-flag) checks.
Tests. test_decorator_coverage (every verdict-emitting command carries the
required decorators or is allowlisted); test_one_default_per_flag; a snapshot
of each command's resolved option set so an accidental drop is caught in review.
Risk & rollback. Low/medium — mechanical, but Click decorator order affects
--help grouping; pin order in the decorator definitions. This phase fixes the
§Context divergences and is independently shippable even if 3–7 slip.
Done-when. Divergence matrix (ADR §Context #2) is empty; appcompat now has
--strict-suppressions, compare-release has --ast-frontend/--demangle,
debug-resolution is uniform — all via the shared decorator, not copies.
Phase 3 — Depth vocabulary + L5-internal (D5, D6)¶
Status: landed. One user-facing dial --depth {symbols,headers,build,source,full}
now spans compare/deep-compare/dump/scan via a shared DepthParam
(cli_params.py): it adds symbols (L0/L1, suppresses the L2 AST), drops graph
as a user rung, and resolves the deprecated --depth graph → source with a
stderr note. compare gained --depth/--max (folded into the collect mode the
same way dump/deep-compare do). --collect-mode is now a hidden deprecated
alias that warns when used. The L5 graph stays internal — built automatically at
--depth source — and EvidenceDepth.GRAPH survives only for scan's pr-deep
preset (determinism). A new .abicheck.yml sources.graph: summary|full knob
(default summary, BuildConfig.graph_detail) caps/deepens the graph replay
scope (effective_graph_scope). The deprecated spellings are catalogued in
cli_options.DEPRECATED_FLAGS (the window-enforcing resolver lands in Phase 7).
Covered by tests/test_depth_vocabulary.py (alias resolution, monotone ladder,
graph-at-source, symbols-only, config knob). --source-method/--mode on scan
are unchanged — their move to config is Phase 5 (D4), not D5.
Work. --depth {symbols,headers,build,source,full} + --max on
@evidence_options (incl. per-side --old/new-sources, --old/new-build-info).
Map deprecated --collect-mode/--mode/standalone --source-method and the
G21 --depth graph value into DEPRECATED_FLAGS (graph → source). Make
the L5 graph an internal consequence of --depth source; delete the graph-*
rungs; add the source.graph: summary|full config knob (default summary).
Tests. test_depth_alias_resolution (every old spelling → the right
AnalysisDepth, incl. graph→source); test_depth_monotone (each rung is a
superset of the one below); test_graph_built_at_source_depth (no user mode
needed).
Risk & rollback. Medium — user-visible vocabulary change, mitigated by aliases + stderr deprecation notes. Rollback = keep aliases as the primary spelling.
Done-when. --help shows one dial; all three legacy vocabularies resolve as
aliases; graph no longer appears as a user-facing depth.
Phase 4 — Command consolidation (D7)¶
Status: landed. compare now classifies each operand
(classify_compare_operand in cli_resolve.py) as file / directory / package /
app — snapshots ride the file path — and dispatches: a directory or package operand fans out to the
per-library release comparison (cli._dispatch_release_compare → the existing
compare-release engine through the single Tier-2 service.run_compare
chokepoint, so a library gets the identical verdict from compare and
compare-release); an application/PIE operand (_looks_like_application, a
positive ET_EXEC / PIE-with-PT_INTERP-and-non-.so-name test — never a guess) is
rejected with a hint at appcompat. The set-input fan-out flags (-j/--jobs,
--dso-only, --output-dir) ride a shared set_input_options decorator and
no-op-with-warning on single files. compare-release and deep-compare keep
working as thin deprecated aliases that emit a stderr note (suppressed for
machine formats and when compare-release runs as the fan-out backend). Covered
by tests/test_compare_dispatch.py (classifier, file/dir dispatch, app
rejection, compare <dir> <dir> == compare-release <dir> <dir> JSON parity,
--output-dir fan-out, alias smoke).
Work. compare input-type dispatch: file / snapshot / directory / package /
(app → actionable hint to appcompat). Disambiguate ET_DYN PIE executables
from .so (ELF type alone is insufficient — fall back to DT_SONAME presence
and require an explicit operand kind when still ambiguous, never guess). Move
set-only flags (-j/--jobs, --dso-only, --output-dir, bundle opts) under
the dispatch, no-op-with-warning on single-file inputs. Preserve the
two-level output for set inputs (summary on stdout/-o, per-library reports
under --output-dir). Turn compare-release and deep-compare into thin
deprecated aliases. appcompat/plugin-check stay distinct verbs.
Tests. test_compare_dispatch_* (file, snapshot, dir, package, ambiguous
PIE → error); test_release_fanout (summary + per-lib reports match the old
compare-release output); alias smoke tests for the two folded commands.
Risk & rollback. Medium/high — the dispatch is the most behaviour-bearing
change. Rollback = keep compare-release/deep-compare as real commands (the
aliases already point at the same code, so reverting is cosmetic).
Done-when. compare <dir> <dir> reproduces a known compare-release run
byte-for-byte on the summary; ambiguous-binary inputs error with guidance.
Phase 5 — CLI↔config rebalance (D4)¶
Status: landed. BuildConfig (buildsource/inline.py) gained the
project-contract blocks severity: (preset + per-category), scope:
(public/collapse_versioned_symbols/public_symbols), suppression:
(strict/require_justification), source: (method, the precise S-axis),
plus the top-level exit_code_scheme: and version: — all validated, with a
to_dict() that round-trips through from_dict. compare auto-discovers the
nearest .abicheck.yml (discover_project_config, overridable with --config)
and merges CLI flags over it through one pure resolver
(resolve_compare_config → ResolvedCompareConfig) with precedence CLI >
config > built-in default. The demoted families (per-category severity, scope
FP-tuning, suppression hygiene) stay on the CLI as hidden overrides — still
functional for a one-off run, off the visible surface. The exit-code scheme is
now explicit (--exit-code-scheme {auto,legacy,severity}, D12): auto resolves
to severity when a severity setting is in effect (CLI or config) else legacy,
so passing --severity-* no longer silently flips an explicitly-pinned scheme.
The D10.5 budget is a COMPARE_FLAG_BUDGET constant + count_visible_options
(WARN nudge). Covered by tests/test_config_rebalance.py (per-key precedence,
dataclass/YAML round-trip, flag budget + hidden/visible split, explicit
exit-scheme, config-driven exit scheme and severity).
Work. Extend .abicheck.yml — the loader is buildsource/inline.py
(load_build_config/BuildConfig; risk_rules/crosschecks already live in
risk.py/crosscheck.py). Add severity: (per-category), scope: (FP-tuning,
public-surface list), suppression: (strict/justification policy), source:
(precise S-axis, graph detail), exit_code_scheme:, and wire each into the
existing policy_file.py/severity.py/suppression.py loaders. Demote the
corresponding flags to config; CLI keeps coarse
overrides (--severity-preset, --show-filtered, --depth,
--exit-code-scheme). Precedence: CLI > config > built-in default — one
resolver, tested. Add --exit-code-scheme (D12) and the D10.5 flag-count budget
(WARN).
Tests. test_config_precedence (CLI beats config beats default, per key);
test_config_roundtrip (load→dump→load stable); test_flag_budget (compare ≤
budget); test_exit_scheme_explicit (--severity-* no longer flips the scheme).
Risk & rollback. High — biggest UX shift; a project without config must keep working on built-in defaults. Land after 1–4 so the structure is stable. Rollback = re-expose the demoted flags (they still map to the same request fields).
Done-when. A project runs abicheck compare old new with everything else in
.abicheck.yml; flag-count budget passes; precedence test green.
Phase 6 — --ast-frontend, MCP name-map, validation, docs (D8, D9, D10.3)¶
Status: landed. --header-backend → --ast-frontend (D8) on
compare/dump, with --old/new-ast-frontend per-side; the env knob is now
ABICHECK_AST_FRONTEND. The legacy --header-backend flag spellings and the
ABICHECK_HEADER_BACKEND env alias were removed outright (clean removal — the
project is pre-1.0 and not under compatibility versioning). The single
MCP_CLI_NAME_MAP
(D10.3) reconciles every abi_compare MCP param with its compare flag and is
enforced by a new cli-contract sub-check (_check_mcp_cli_name_map) plus the
live tests/test_cli_contract.py::test_mcp_cli_name_map_complete. CompareRequest
gained a frontend field and validate() now rejects an out-of-enum
--ast-frontend (with the allowed set) and an android frontend without source
inputs (D9), threaded through service.run_compare. Covered by
tests/test_api_types.py (frontend enum + android-needs-sources) and the new
tests/test_cli_contract.py D8/D10.3 cases.
L4-frontend unification (now landed). --ast-frontend flows through the
dump inline-collection chain (dump/deep-compare → _write_snapshot_output
→ embed_build_source → collect_inline_pack(extractor=…)), so one frontend
choice drives both the L2 header AST and the L4 source-ABI replay (auto/
castxml/clang). The android value stays on collect's explicit
--source-abi-extractor — it has no header-AST path, so it is not exposed on
the header-AST commands (the CompareRequest.validate() android-needs-sources
rule guards the API path). CompareRequest.validate() also pre-flights a
missing --policy-file path (D9). The only non-blocking follow-up is cosmetic
doc regen (a standalone cli-flags.md page); per the docs convention we prefer
--help over a hand-rolled flag table, and the .abicheck.yml blocks are
documented in concepts/build-source-data.md.
Work. Rename --header-backend → --ast-frontend (+ ABICHECK_AST_FRONTEND
env, + old aliases); wire it to the L4 extractor selection too. Introduce the
single MCP_CLI_NAME_MAP and align mcp_server.py params to it (D10.3 check).
Flesh out CompareRequest.validate(): mutually-exclusive flags, enum values,
depth feasibility, --ast-frontend android without --sources (D9). Regenerate
docs/user-guide/cli-flags.md from --help, update
docs/reference/exit-codes.md, add the .abicheck.yml schema reference page.
Tests. test_ast_frontend_alias; test_mcp_cli_name_map_complete (no param
or flag missing from the map); test_validate_* (each rule, asserting identical
CLI/MCP error text per goal AC 8).
Risk & rollback. Low/medium — additive plus a rename behind an alias. Doc
regen is mechanical (mkdocs --strict is the guard).
Done-when. MCP↔CLI name-map test green; validation errors identical across
front-ends; docs build --strict.
Phase 7 — Backward-compat scaffolding (future-enabled)¶
Status: landed. cli_options now exposes the single deprecation-window
resolver over the DEPRECATED_FLAGS registry: resolve_deprecated_flag(spelling)
→ (replacement, reason), deprecated_flags_in_argv(argv) (scans for every
deprecated spelling — both flag-level renames and value-level deprecations like
--depth=graph, matched as --depth graph too), and note_deprecated_flags(argv)
(the combined one-line note the 1.0 switch-on will route all front-ends through;
the live per-flag sites stay until then). .abicheck.yml is forward-compatible:
BuildConfig carries version: and from_dict warns (never errors) on any
unknown top-level or in-block key (_KNOWN_TOP_KEYS/_KNOWN_BLOCK_KEYS; sibling
risk_rules/crosschecks are recognized so they don't trip it). The
deprecation-window test stays advisory (a pytest table-test, not an
AI-readiness ERROR gate) until 1.0 per ADR-037 §Backward compatibility. Covered
by tests/test_cli_contract.py (test_every_deprecated_flag_resolves,
test_deprecated_flags_in_argv, test_note_deprecated_flags_combines) and
tests/test_config_rebalance.py::TestConfigForwardCompat.
Work. Build the DEPRECATED_FLAGS resolver + stderr deprecation notes;
test that every alias in the table still resolves. Add version: to
.abicheck.yml (unknown keys warn, not error). Keep the deprecation-window test
advisory (not ERROR) until 1.0 per ADR-037 §Backward compatibility.
Tests. test_deprecated_flags_resolve (table-driven); test_config_version_forward_compat
(unknown key warns, load succeeds).
Risk & rollback. Low — pure scaffolding, no enforcement yet.
Done-when. Every deprecated spelling from phases 3–6 resolves with a note; the 1.0 switch-on is a one-line severity change, documented.
Sequencing & PR map¶
- PR-1 (Phase 1) and PR-2 (Phase 2) are highest-value / lowest-risk and land first: they kill the classification drift and the copy-paste without user-visible change. Either can merge independently.
- PR-3 (Phase 3) and PR-4 (Phase 4) are user-visible; both ship behind deprecation aliases so no invocation breaks hard in one release. PR-4 depends on PR-1 (needs the single chokepoint to fold cleanly).
- PR-5 (Phase 5) is the biggest UX shift (flags → config); it depends on PR-2 (decorators) + PR-3 (depth) being in so the demoted surface is stable.
- PR-6 and PR-7 are additive and can trail; PR-6 depends on PR-1 (the request type) and PR-2 (decorators).
Dependency sketch: 1 → {2, 4}, {2,3} → 5, {1,2} → 6, {3..6} → 7.
Measurement (proves the headline claims)¶
The "~62 → ~20 flags" and "no divergence" claims are testable, not aspirational:
- A snapshot test records each command's option count; CI diffs it so a regression (a flag sneaking back inline) is visible in review.
- The D10.5 budget gives
comparea hard ceiling once Phase 5 lands. test_no_tier_skip+ the drift parity test (Phase 1) make "one classifier" a runtime guarantee, not a doc promise.
Definition of done (when implementation lands)¶
All seven phases have landed. Typed CompareRequest + single service
chokepoint; the shared option-family decorators; the unified --depth
vocabulary; compare input-type dispatch folding compare-release/
deep-compare; the .abicheck.yml config rebalance with explicit
--exit-code-scheme; --header-backend → --ast-frontend (one frontend across
L2 header AST and L4 source-ABI replay) + MCP_CLI_NAME_MAP +
CompareRequest.validate(); and the deprecation-window resolver + config
forward-compat — all enforced by the cli-contract gate (D10.1–D10.4) and its
test mirror. Registry UC-WF-cli-contract is now complete (evidence:
api_types.py, the cli-contract gate, tests/test_cli_contract.py, and the
alias/round-trip/forward-compat tests) and ADR-037 is Accepted — implemented.
The sole residual is the --ast-frontend android value, which remains on
collect's --source-abi-extractor because it has no header-AST path.