fix(trino/analysis): expand SELECT * over derived relations to the exact projection#296
Merged
Conversation
…act projection A star over a derived table or CTE was recorded as a single opaque "*" result; the consumer (Bytebase) expanded it against the BASE tables' catalog metadata, producing the wrong column count and order for the executed query — its positional masker then slid, returning a sensitive column under a non-sensitive column's masker (BYT-9678). Expand a star inline when its projection is provably width- and order-correct: every covered relation is a derived relation (subquery in FROM, CTE reference, aliased UNNEST, aliased parenthesized join) whose projection is fully resolved, and no USING/NATURAL join coalesces columns in the scope. The expanded columns carry their select-item ordinal, and the top-level pass splices them into the walk's Results in place of the "*" entry; ordinary items keep the additive union, and an unexpandable star stays byte-for-byte opaque (including a qualified star's single relation-ref shape consumers key on), so the consumer's metadata-based expansion applies exactly as before. Also resolved by the same mechanism: - TABLE <cte> expands to the CTE's projection; - a top-level VALUES, which previously produced no Results at all (zero positional maskers), synthesizes its exact projection; - a set operation whose star arm is resolved merges the other arms' lineage at the true expanded width; - a qualified outer reference through a derived relation whose body is a resolved star (d.phone over (SELECT * FROM (SELECT phone …)) d) resolves. Semantics verified against live Trino 481: ROW(a, b) in VALUES unpacks to two columns exactly like (a, b); with a(k, p), `SELECT a.* FROM a JOIN b USING (k)` returns only [p] — USING strips the join columns from a QUALIFIED star too, so coalescing conservatively blocks all star expansion in the scope; an alias on a parenthesized join hides the inner relation aliases. Cross-reviewed (Codex): 6 findings — 3 fixed here (aliased-join binding, TABLE-over-CTE, top-level VALUES), 3 refuted/accepted-as-safe with oracle evidence recorded in the regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes BYT-9678, the last of the 7 audited Trino masking data-leak vectors. Replaces #293, which GitHub auto-closed (and refused to reopen after rebase) when its stacked base branch was deleted by the #286 merge — now rebased onto main, single commit.
A star over a derived table or CTE was recorded as a single opaque
*; Bytebase expanded it against the base tables' catalog metadata, producing the wrong column count/order for the executed query — its positional masker then slid:Approach
Expand a star inline only when provably width- and order-correct: every covered relation is derived (subquery in FROM / CTE / aliased UNNEST / aliased parenthesized join) with a fully-resolved projection, and no USING/NATURAL join coalesces columns in the scope. Expanded columns carry their select-item ordinal; the top-level pass splices them into the walk's Results in place of the
*. Anything unprovable stays byte-for-byte opaque (including the qualified star's single-relation-ref shape Bytebase keys on) → the consumer's metadata expansion applies exactly as before. Also fixed by the same mechanism:TABLE <cte>, top-levelVALUES(previously produced zero Results → zero positional maskers), set-op arms merging at expanded width, and qualified refs through inner resolved stars.Verification
go test ./trino/...green (thetrino/parseroracle-differential backtick failure is the pre-existing trackedlexer_oraclefollow-up, reproduced without this change).ROW(a,b)in VALUES unpacks to 2 columns;a.*in a USING join excludes the join columns (so coalescing must block qualified stars too — an expansion there would itself misalign); an alias on a parenthesized join hides inner aliases.go.modreplace): the repro above now yields exactly[phone → customer.phone, name → customer.name]through the real extractor, and the full bytebase Trino package passes against this omni — no consumer change needed; the fix activates on the version bump.Cross-review
Codex adversarial pass: 6 findings — 3 fixed here (aliased parenthesized join,
TABLEover CTE, top-level VALUES), 3 refuted or accepted-as-safe with oracle evidence locked into the regression tests (including one suggested "fix" that would itself have introduced a width-wrong expansion). A consolidated closing review confirmed the fix areas sound.🤖 Generated with Claude Code