[python] Add ReadBuilder.explain() for scan-plan visibility#7869
Open
TheR1sing3un wants to merge 6 commits into
Open
[python] Add ReadBuilder.explain() for scan-plan visibility#7869TheR1sing3un wants to merge 6 commits into
TheR1sing3un wants to merge 6 commits into
Conversation
Introduce ReadBuilder.explain() returning a structured ExplainResult that summarises the target snapshot, the pushed-down predicate / projection / limit, the partition / bucket / file-stats pruning funnel, and split- level execution signals (raw-convertible ratio, deletion-vector ratio, level histogram, files-per-split and split-size distribution). A new opt-in ScanStats counter set is wired through FileScanner via TableScan.scan_with_stats(). The regular read hot path is unaffected when scan_stats is None. To produce accurate before/after counters, explain() suppresses the manifest reader's early bucket filter and forces single-threaded manifest decoding for the one pass that drives it. The order of partition and bucket checks in _filter_manifest_entry is rearranged so each pruning stage maps cleanly to one counter; both filters remain pure AND tests and the final survivor set is identical. Predicate rendering lives in a standalone helper so Predicate itself stays rendering-agnostic.
Cover the seven scenarios called out in the design: append-only baseline, PK partitioned + HASH_FIXED with predicate that triggers both partition and bucket pruning, predicate rendering shapes (equal, in, between, isNull, and/or), verbose split detail alignment with plan().splits(), empty snapshot path, split-level signals (raw-convertible / DV / L0) across append-only and DV-on PK tables, and pretty-print smoke for the compact layout anchors.
JingsongLi
reviewed
May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add
ReadBuilder.explain()returning a structuredExplainResultso userscan see what a PyPaimon read will actually do — target snapshot, pushed-down
predicate / projection / limit, partition / bucket / file-stats pruning
funnel, and split-level execution signals (raw-convertible ratio, deletion-
vector ratio, level histogram, split-size skew).
The default
__str__is a compact debug layout;verbose=Truelists everysplit. Reads manifest list + manifests only — data files are never opened.
Why
Planexposes onlysplitsandsnapshot_idtoday;FileScanneralreadydoes partition / bucket / file-stats pruning but none of that is visible to
users. The only way to inspect cost is reading INFO logs or walking
plan().splits()by hand. Apache Paimon Java has no SQL EXPLAIN of its owneither (that comes from Flink / Spark); this PR is scoped to scan-plan
visibility, not query planning.
Sample output
PK + partition + HASH_FIXED bucket, predicate
dt = '2026-05-12' AND id = 7:Tests
pypaimon/tests/read_builder_explain_test.pycovers 7 scenarios:append-only baseline, PK/partition/bucket pruning funnel, predicate
rendering, verbose splits, empty snapshot, split-level signals, pretty-print
smoke. Full read regression is clean.
API / format impact
New API only:
ReadBuilder.explain(verbose=False) -> ExplainResult. Hotread path untouched —
ScanStatsis opt-in and only enabled byexplain().No data / wire format change. No Java-side change.
Follow-up
A follow-up patch will surface
explainthrough the pypaimon CLI(alongside
cli_sql/cli_table) so users can inspect a query plan fromthe command line without writing any Python. A
# TODOnext toReadBuilder.explainmarks the entry point.Generative AI usage
Drafted with the help of Claude Code; reviewed and tested locally by the
author.