Data Source Remeasurement Status - 2026-06-11¶
Task: remeasure-all
Scope: First local proof pass for the L0-L5 data-source/process remediation work.
Environment¶
- Repo:
/home/openclaw/.openclaw/workspace-abicheck/repo - Command prefix needed locally:
PYTHONPATH=. - Available tools:
castxml:/home/openclaw/.local/bin/castxmlgcc:/usr/bin/gccg++:/usr/bin/g++cmake:/usr/bin/cmakepytest:9.0.2
Full Synthetic Example Remeasurement¶
Command:
Result:
- Exit code:
1 - Total cases:
129 - PASS:
104 - XFAIL:
5 - SKIP:
7 - FAIL:
10 - ERROR:
3
Unexpected FAIL cases:
case01_symbol_removal: expectedBREAKING, gotNO_CHANGEcase02_param_type_change: expectedBREAKING, gotNO_CHANGEcase03_compat_addition: expectedCOMPATIBLE, gotNO_CHANGEcase102_frozen_runtime_signature_changed: expectedBREAKING, gotNO_CHANGEcase10_return_type: expectedBREAKING, gotNO_CHANGEcase12_function_removed: expectedBREAKING, gotNO_CHANGEcase33_pointer_level: expectedBREAKING, gotNO_CHANGEcase46_pointer_chain_type_change: expectedBREAKING, gotNO_CHANGEcase59_func_became_inline: expectedBREAKING, gotNO_CHANGEcase66_language_linkage_changed: expectedBREAKING, gotNO_CHANGE
Unexpected ERROR cases:
case126_sycl_device_impl_ptr:dump v1 failed; CastXML failed while processingv1.hcase80_pimpl_shared_to_unique:dump v1 failed; CastXML failed while processingv1.hcase89_inline_accessor_renamed_pimpl_member:dump v1 failed; CastXML failed while processingv1.h
Known expected non-pass buckets:
- XFAIL: 5 known gaps from
examples/ground_truth.json - SKIP: 7 platform/tooling/layout skips, including bundle cases delegated to bundle tests and BTF fixture delegated to kernel workflow tests
Real-World Matrix Status¶
Command planned:
Status:
- Blocked locally:
validation/libs/exis not present. - This matches
validation/README.md: real binaries are intentionally not committed and must be fetched/extracted fromvalidation/data/manifest.json.
Current checked-in real-world inventory:
- Manifest pairs: 11
- Current result records: 33 shared-library comparisons
- Current evidence modes in
validation/data/results.json: sym->sym: 25dwarf->sym: 5dwarf->dwarf: 3
Dirty Worktree Context¶
Pre-existing modified files not touched by this task:
abicheck/elf_symbol_filter.pytests/test_elf_symbol_filters.pyvalidation/data/results.json
validation/data/results.json was intentionally restored to the main version
in PR #351 after review. The refreshed local real-world output had not been
paired with a regenerated validation/REPORT.md, so keeping it in the PR would
make the checked-in report and result artifact contradict each other.
Files added/updated by this task:
docs/development/data-source-process-remediation-plan.mddocs/development/data-source-remeasurement-status-2026-06-11.md
Next Required Steps¶
- Investigate the 10
NO_CHANGEfalse negatives from full example validation. - Investigate the 2 CastXML errors in pimpl-related examples.
- Run component suites listed in
data-source-process-remediation-plan.md. - Fetch/extract the real-world conda packages from
validation/data/manifest.jsonand rerunvalidation/scripts/run_matrix.py. - Extend report/validation outputs beyond current
sym/dwarflabels to explicit L0-L5 coverage. First runtime diagnostic slices are now implemented for--show-data-sourceswith L3/L4/L5 build-source pack reporting, plus compare-side old/new L0-L5 coverage summaries. - Run the newly enabled artifact-variant matrix across all examples:
python tests/validate_examples.py --artifact-variant all --json. The runner now supports debug+headers, release+headers, stripped+headers, and build/source-evidence variants; the full all-example matrix still needs to be measured and triaged.
Example Artifact Variant Enablement¶
Implemented in PR #351:
debug-headers: existing default; builds Debug/-gwhere applicable and passes discovered public headers.release-headers: builds without debug flags / with Release CMake profile and still passes headers.stripped-headers: builds with debug info, strips debug sections, and still passes headers.build-source: builds with headers, collects L3/L4/L5 evidence throughcollect --compile-db --source-abi --source-abi-extractor castxml --source-graph summary, and compares with old/new build-source packs.
Smoke proof:
Result: PASS=4, one pass for each variant.
Runtime Data-Source Diagnostic Implementation¶
Implemented in PR #351:
abicheck dump --show-data-sourcesnow reports L0-L5 availability instead of only L0/L1/L2.abicheck dump --show-data-sources --build-info <pack>loads the build-source pack and reports L3 build context, L4 source ABI, and L5 source graph status.- Historical fixed detector-count claims were removed from live diagnostic output; the CLI now describes the active evidence mode and missing evidence boundaries.
abicheck comparenow prints old/new L0-L5 evidence coverage side by side when build/source coverage is in play. Asymmetric rows are visibly marked in stderr while the existing JSONlayer_coveragefield remains target/new-side focused for backward compatibility.
Targeted proof:
PYTHONPATH=. pytest -q tests/test_dwarf_snapshot.py -k show_data_sources
PYTHONPATH=. pytest -q tests/test_dwarf_coverage_gaps.py -k show_data_sources
ruff check abicheck/dwarf_snapshot.py abicheck/cli.py tests/test_dwarf_snapshot.py
Result: all passed.
Additional implementation proof after compare-side summary:
PYTHONPATH=. pytest -q tests/test_build_source_cli.py::test_compare_with_evidence_emits_coverage_and_findings tests/test_build_source_cli.py::test_compare_asymmetric_old_only_reports_target_not_collected tests/test_layer_coverage.py::test_compare_cli_reports_coverage_asymmetry tests/test_dwarf_snapshot.py::TestShowDataSources tests/test_dwarf_snapshot.py::TestPrintDataSourcesDirect tests/test_validate_examples_unit.py
PYTHONPATH=. python tests/validate_examples.py case04 --artifact-variant all --json
ruff check --no-cache abicheck tests
mypy abicheck
python scripts/check_ai_readiness.py
Result: targeted pytest 29 passed; case04 artifact variants PASS=4; ruff
clean; mypy clean; AI-readiness 0 errors and 13 warnings.
Additional PR-review fix proof:
PYTHONPATH=. python tests/validate_examples.py case04 --artifact-variant all --json
PYTHONPATH=. python tests/validate_examples.py case103 --artifact-variant build-source --json
PYTHONPATH=. python tests/validate_examples.py case104 --artifact-variant build-source --json
PYTHONPATH=. python tests/validate_examples.py case104 --artifact-variant release-headers --json
Result: case04 PASS=4; case103 build-source PASS=1 with expected
COMPATIBLE_WITH_RISK from L3 toolchain-flag evidence; case104 build-source
PASS=1; case104 release-headers PASS=1.
Remeasurement Artifact Readiness¶
Implemented in PR #351:
tests/validate_examples.py --jsonnow emits schemavalidate_examples.v2.- Top-level metadata records runner, command, platform, selected cases,
examples/ground_truth.jsoncorpus size, and selected artifact variants. - Each result records component, case id, platform, mode, source layers, evidence asymmetry, runtime seconds, expected verdict, actual verdict, status, and whether manual review is acceptable.
validation/scripts/run_matrix.pynow emits real-world matrix records with schemarun_matrix.v2.- Real-world records include component, case id, platform, logical library, mode, old/new source layers, evidence asymmetry, runtime, expected verdict, actual verdict, normalized compatibility-axis verdicts, comparison status, exit code, stderr, summary counts, release recommendation, and optional layer coverage.
- Real-world run metadata is written to
validation/data/results.meta.jsonwith runner, command, platform, manifest pair count, comparison count, and observed evidence modes and comparison-status counts. validation/scripts/run_component_suites.pynow emits component-suite records with schemacomponent_suites.v1.- Component-suite records include suite/case id, platform, supported platforms, source layers, pytest command, runtime, status, pytest summary counts, and explicit blocked reasons for missing files or optional dependencies.
validation/scripts/summarize_remeasurement.pynow consumesvalidate_examples.v2,component_suites.v1, andrun_matrix.v2artifacts and emitsremeasurement_summary.v1.- The combined summary records total records, blocking failures, status/verdict/mode/source-layer counts, real-world expectation mismatches, real-world run errors, and component-suite blocked reasons.
- Real-world summaries no longer count expected non-zero compare exit codes as
blocking failures; expected
BREAKING/API_BREAKoutcomes are scored by expected-vs-actual verdict status instead.
Smoke proof:
PYTHONPATH=. python tests/validate_examples.py case04 --artifact-variant all --json
PYTHONPATH=. pytest -q tests/test_validate_examples_unit.py
PYTHONPATH=. pytest -q tests/test_validation_run_matrix.py
PYTHONPATH=. pytest -q tests/test_validation_component_suites.py
PYTHONPATH=. pytest -q tests/test_validation_remeasurement_summary.py
PYTHONPATH=. python validation/scripts/run_component_suites.py --suite report-policy --dry-run --output /tmp/component_suites.json
Result: case04 PASS=4; validate-example unit tests 20 passed;
run-matrix unit tests 5 passed; component-suite unit tests 4 passed;
remeasurement-summary unit tests 4 passed; component-suite dry-run wrote one
planned suite record.
Plan Refresh For Current Build/Source Capabilities¶
Updated in PR #351 after the first implementation slice:
- The remediation plan now uses the current L0-L5 BuildSourcePack model instead of the older L0-L4 draft.
- L3 is explicitly build/toolchain/package context: compile DB, CMake, Ninja, Bazel, Make, compiler-record recovery, and external extractor manifests.
- L4 is explicitly source ABI replay: clang, CastXML, or Android header-ABI dumps, replay scopes, cache, changed-path filtering, and partial-coverage degradation.
- L5 is explicitly source graph: summary graph plus optional include/call, Kythe, and CodeQL augmentation for graph diff and localization.
- Consumer/appcompat/bundle/stack/policy inputs are recorded as impact/report context that consumes L0-L5 findings, not as a separate canonical evidence layer.
Component Suite Status¶
Initial component-suite command:
PYTHONPATH=. pytest -q tests/test_elf_metadata_unit.py tests/test_elf_parse_integration.py tests/test_elf_symbol_filters.py tests/test_elf_version_policy.py tests/test_surface.py tests/test_surface_scope_parity.py tests/test_confidence_evidence.py tests/test_stripped_degradation.py tests/test_dwarf_snapshot.py tests/test_dwarf_metadata_coverage.py tests/test_dwarf_unified.py tests/test_debug_resolver.py tests/test_btf_metadata.py tests/test_btf_integration.py tests/test_ctf_metadata.py tests/test_pdb_metadata.py tests/test_pdb_parser.py tests/test_pe_metadata_unit.py tests/test_macho_metadata_unit.py tests/test_build_context.py tests/test_package.py tests/test_package_extractor_matrix.py tests/test_bundle.py tests/test_stack_checker.py tests/test_appcompat.py tests/test_appcompat_examples.py tests/test_report_schema.py tests/test_reporter.py tests/test_sarif.py tests/test_junit_report.py tests/test_policy_changekind_matrix.py tests/test_policy_file.py tests/test_baseline.py tests/test_suppression_matrix.py
Result:
- Exit code:
2 - Blocked during test collection:
tests/test_pe_metadata_unit.py - Blocker: missing Python dependency
pefile
Second component-suite command excluded tests/test_pe_metadata_unit.py but kept the rest of the local/Linux component list.
Result:
- Exit code:
1 - Passed:
1641 - Skipped:
5 - Failed:
10 - Warnings:
12
Likely real failures:
tests/test_surface_scope_parity.py::test_internal_type_change_scoped_out_by_both- expected non-breaking scoped result, got
BREAKING tests/test_surface_scope_parity.py::test_public_type_change_breaking_for_both- expected breaking public type change, got
NO_CHANGE
Tooling/dependency failures from missing pefile:
tests/test_macho_metadata_unit.py::TestCliIntegration::test_dump_native_binary_petests/test_macho_metadata_unit.py::TestCliIntegration::test_dump_native_binary_pe_empty_exports_raisestests/test_appcompat.py::TestParsePeAppRequirements::test_named_importstests/test_appcompat.py::TestParsePeAppRequirements::test_ordinal_only_importstests/test_appcompat.py::TestParsePeAppRequirements::test_filter_by_dll_nametests/test_appcompat.py::TestParsePeAppRequirements::test_pe_parse_errortests/test_appcompat.py::TestParsePeAppRequirements::test_no_import_directorytests/test_appcompat.py::TestGetNewLibExports::test_pe_exports
Known false-positive / real-world scan gate:
PYTHONPATH=. pytest -q tests/test_real_world_false_positives.py tests/test_realworld_scan.py tests/test_fp_rate_gate.py
Result:
- Exit code:
1 - Passed:
67 - Failed:
2
Failures:
tests/test_realworld_scan.py::TestRealWorldCompatibleRelease::test_compatible_release- expected
FUNC_ADDEDforcompress_reset; observed onlyENUM_MEMBER_ADDED tests/test_realworld_scan.py::TestRealWorldBreakingRelease::test_breaking_release- expected
FUNC_REMOVEDforcompress_bound; observed type layout changes but no function removal