Skip to content

feat(trino/analysis): resolve view lineage via catalog metadata#295

Merged
h3n4l merged 1 commit into
mainfrom
feat/trino-view-lineage
Jun 10, 2026
Merged

feat(trino/analysis): resolve view lineage via catalog metadata#295
h3n4l merged 1 commit into
mainfrom
feat/trino-view-lineage

Conversation

@h3n4l

@h3n4l h3n4l commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Fixes BYT-9679 (view columns bypass masking) in omni — supersedes the consumer-side bytebase#20560 (closed), per the decision that view lineage belongs in omni's metadata-aware layer rather than each consumer. Stacked on #293 (which stacks on #286) — review the single commit dcee0f1; retarget/rebase as the stack merges.

A column selected through a view had lineage pointing at the view column; masking config can only attach to table columns, so the value came back unmasked:

-- masking policy on customer.phone; customer_v AS SELECT id, phone, name FROM customer
SELECT phone FROM customer_v;   -- returned in clear

Approach

analysis.GetQuerySpanWithCatalog(stmt, *catalog.Catalog) — the same trino/catalog type completion already consumes; catalog.View gains a Definition field. A FROM reference the catalog resolves to a view binds as a derived relation carrying its definition's resolved projection (computed recursively in the view's own catalog/schema context, cycle-guarded, memoized) — so named refs, SELECT * over views, views-over-views, and CTE-based definitions all resolve through the existing #286/#293 resolver machinery. The definition's relations join AccessTables (qualified). A catalog-known base table binds with its catalog columns: stars over base tables and mixed base+derived joins expand to the exact projection, and relation column aliases (FROM customer AS c(i, p, …)) resolve.

Safety: resolution stays additive (written refs always retained); a star expands only when provably width/order-correct; stale metadata (column-count mismatch with the definition) makes the view opaque instead of risking a width-wrong expansion; a nil catalog is byte-identical to GetQuerySpan (entire existing suite passes unchanged).

Verification

19 view-lineage regression tests (incl. cross-schema context, cycle termination + star opacity, table-shadows-view precedence, stale-metadata opacity, relation column aliases); full trino/analysis + trino/catalog suites green. (The trino/parser oracle-differential backtick failure is the pre-existing tracked lexer_oracle follow-up, reproduced with this change stashed.)

Cross-review

Codex adversarial pass: 4 findings — relation-column-alias lineage (P0), stale-metadata count mismatch (P0), partial-projection memoization under cycles (P2) all fixed; intermediate views in AccessTables kept intentionally and documented (reading through a view accesses it). A focused re-verification pass confirmed the fixes sound.

Consumer note (for the bytebase bump PR)

Bytebase wiring becomes: fill catalog.View.Definition when building the catalog it already builds, set the session context, and call GetQuerySpanWithCatalog. The base tables surfaced through views appear in AccessTables — base-table access checks then apply to queries through views; flagging for product review at bump time.

🤖 Generated with Claude Code

@h3n4l h3n4l force-pushed the feat/trino-star-projection branch from bbe8753 to e7c9217 Compare June 10, 2026 06:11
@h3n4l h3n4l force-pushed the feat/trino-view-lineage branch from dcee0f1 to bfd1d88 Compare June 10, 2026 06:15
@h3n4l h3n4l changed the base branch from feat/trino-star-projection to main June 10, 2026 06:15
A column selected through a view had lineage pointing at the view column.
Masking configuration can only attach to table columns, so the consumer found
no masker and returned the (possibly sensitive) value unmasked (BYT-9679).
The consumer-side fix (bytebase#20560) is superseded by resolving views here,
where it serves every consumer and unifies with the derived-relation resolver.

Add GetQuerySpanWithCatalog: GetQuerySpan plus a *catalog.Catalog (the same
type completion consumes; catalog.View gains a Definition field). A FROM
reference the catalog resolves to a view binds as a DERIVED relation carrying
its definition's resolved projection — computed recursively in the view's own
catalog/schema context, cycle-guarded and memoized — so named references and
stars through views, views over views, and definitions using CTEs/derived
tables all reach the underlying base columns through the existing resolver
machinery. The definition's relations join AccessTables (qualified with the
view's context), and the view's metadata column names apply positionally over
the definition's outputs (a count mismatch — stale metadata — makes the view
opaque rather than risking a width-wrong expansion).

A catalog-known base TABLE binds with its catalog columns: star expansion uses
them (SELECT * over a base table or a mixed base+derived join expands to the
exact projection), and relation column aliases over it (FROM customer AS
c(i, p, …)) resolve to the renamed base columns. Resolution stays additive —
the written ref is always retained — and a nil catalog leaves behaviour
byte-identical to GetQuerySpan.

Cross-reviewed (Codex): 4 findings — relation-column-alias lineage, stale-
metadata count mismatch, and partial-projection memoization under cycles fixed;
intermediate views in AccessTables kept intentionally (reading through a view
accesses it). A focused re-verification pass confirmed the fixes sound.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@h3n4l h3n4l force-pushed the feat/trino-view-lineage branch from bfd1d88 to 1a6bd06 Compare June 10, 2026 06:16
@h3n4l h3n4l merged commit 05728e8 into main Jun 10, 2026
2 checks passed
@h3n4l h3n4l deleted the feat/trino-view-lineage branch June 10, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant