Skip to content

fix(trino/analysis): expand SELECT * over derived relations to the exact projection#296

Merged
h3n4l merged 1 commit into
mainfrom
feat/trino-star-projection
Jun 10, 2026
Merged

fix(trino/analysis): expand SELECT * over derived relations to the exact projection#296
h3n4l merged 1 commit into
mainfrom
feat/trino-star-projection

Conversation

@h3n4l

@h3n4l h3n4l commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Fixes BYT-9678, the last of the 7 audited Trino masking data-leak vectors. Replaces #293, which GitHub auto-closed (and refused to reopen after rebase) when its stacked base branch was deleted by the #286 merge — now rebased onto main, single commit.

A star over a derived table or CTE was recorded as a single opaque *; Bytebase expanded it against the base tables' catalog metadata, producing the wrong column count/order for the executed query — its positional masker then slid:

-- masking policy on customer.phone
SELECT * FROM (SELECT phone, name FROM customer) d;
-- span said 4 columns [id, phone, name, phones]; Trino returns 2 [phone, name]
-- → output col 0 (phone) got col 0's (id → None) masker → LEAK

Approach

Expand a star inline only when provably width- and order-correct: every covered relation is derived (subquery in FROM / CTE / aliased UNNEST / aliased parenthesized join) with a fully-resolved projection, and no USING/NATURAL join coalesces columns in the scope. Expanded columns carry their select-item ordinal; the top-level pass splices them into the walk's Results in place of the *. Anything unprovable stays byte-for-byte opaque (including the qualified star's single-relation-ref shape Bytebase keys on) → the consumer's metadata expansion applies exactly as before. Also fixed by the same mechanism: TABLE <cte>, top-level VALUES (previously produced zero Results → zero positional maskers), set-op arms merging at expanded width, and qualified refs through inner resolved stars.

Verification

  • 20 star regression tests; full go test ./trino/... green (the trino/parser oracle-differential backtick failure is the pre-existing tracked lexer_oracle follow-up, reproduced without this change).
  • Live-oracle semantics (Trino 481): ROW(a,b) in VALUES unpacks to 2 columns; a.* in a USING join excludes the join columns (so coalescing must block qualified stars too — an expansion there would itself misalign); an alias on a parenthesized join hides inner aliases.
  • End-to-end against Bytebase (temporary go.mod replace): the repro above now yields exactly [phone → customer.phone, name → customer.name] through the real extractor, and the full bytebase Trino package passes against this omni — no consumer change needed; the fix activates on the version bump.

Cross-review

Codex adversarial pass: 6 findings — 3 fixed here (aliased parenthesized join, TABLE over CTE, top-level VALUES), 3 refuted or accepted-as-safe with oracle evidence locked into the regression tests (including one suggested "fix" that would itself have introduced a width-wrong expansion). A consolidated closing review confirmed the fix areas sound.

🤖 Generated with Claude Code

…act projection

A star over a derived table or CTE was recorded as a single opaque "*" result;
the consumer (Bytebase) expanded it against the BASE tables' catalog metadata,
producing the wrong column count and order for the executed query — its
positional masker then slid, returning a sensitive column under a non-sensitive
column's masker (BYT-9678).

Expand a star inline when its projection is provably width- and order-correct:
every covered relation is a derived relation (subquery in FROM, CTE reference,
aliased UNNEST, aliased parenthesized join) whose projection is fully resolved,
and no USING/NATURAL join coalesces columns in the scope. The expanded columns
carry their select-item ordinal, and the top-level pass splices them into the
walk's Results in place of the "*" entry; ordinary items keep the additive
union, and an unexpandable star stays byte-for-byte opaque (including a
qualified star's single relation-ref shape consumers key on), so the consumer's
metadata-based expansion applies exactly as before.

Also resolved by the same mechanism:
- TABLE <cte> expands to the CTE's projection;
- a top-level VALUES, which previously produced no Results at all (zero
  positional maskers), synthesizes its exact projection;
- a set operation whose star arm is resolved merges the other arms' lineage at
  the true expanded width;
- a qualified outer reference through a derived relation whose body is a
  resolved star (d.phone over (SELECT * FROM (SELECT phone …)) d) resolves.

Semantics verified against live Trino 481: ROW(a, b) in VALUES unpacks to two
columns exactly like (a, b); with a(k, p), `SELECT a.* FROM a JOIN b USING (k)`
returns only [p] — USING strips the join columns from a QUALIFIED star too, so
coalescing conservatively blocks all star expansion in the scope; an alias on a
parenthesized join hides the inner relation aliases.

Cross-reviewed (Codex): 6 findings — 3 fixed here (aliased-join binding,
TABLE-over-CTE, top-level VALUES), 3 refuted/accepted-as-safe with oracle
evidence recorded in the regression tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@h3n4l h3n4l merged commit 921d57a into main Jun 10, 2026
2 checks passed
@h3n4l h3n4l deleted the feat/trino-star-projection branch June 10, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant