feat(trino/analysis): resolve view lineage via catalog metadata#295
Merged
Conversation
bbe8753 to
e7c9217
Compare
dcee0f1 to
bfd1d88
Compare
A column selected through a view had lineage pointing at the view column. Masking configuration can only attach to table columns, so the consumer found no masker and returned the (possibly sensitive) value unmasked (BYT-9679). The consumer-side fix (bytebase#20560) is superseded by resolving views here, where it serves every consumer and unifies with the derived-relation resolver. Add GetQuerySpanWithCatalog: GetQuerySpan plus a *catalog.Catalog (the same type completion consumes; catalog.View gains a Definition field). A FROM reference the catalog resolves to a view binds as a DERIVED relation carrying its definition's resolved projection — computed recursively in the view's own catalog/schema context, cycle-guarded and memoized — so named references and stars through views, views over views, and definitions using CTEs/derived tables all reach the underlying base columns through the existing resolver machinery. The definition's relations join AccessTables (qualified with the view's context), and the view's metadata column names apply positionally over the definition's outputs (a count mismatch — stale metadata — makes the view opaque rather than risking a width-wrong expansion). A catalog-known base TABLE binds with its catalog columns: star expansion uses them (SELECT * over a base table or a mixed base+derived join expands to the exact projection), and relation column aliases over it (FROM customer AS c(i, p, …)) resolve to the renamed base columns. Resolution stays additive — the written ref is always retained — and a nil catalog leaves behaviour byte-identical to GetQuerySpan. Cross-reviewed (Codex): 4 findings — relation-column-alias lineage, stale- metadata count mismatch, and partial-projection memoization under cycles fixed; intermediate views in AccessTables kept intentionally (reading through a view accesses it). A focused re-verification pass confirmed the fixes sound. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bfd1d88 to
1a6bd06
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes BYT-9679 (view columns bypass masking) in omni — supersedes the consumer-side bytebase#20560 (closed), per the decision that view lineage belongs in omni's metadata-aware layer rather than each consumer. Stacked on #293 (which stacks on #286) — review the single commit
dcee0f1; retarget/rebase as the stack merges.A column selected through a view had lineage pointing at the view column; masking config can only attach to table columns, so the value came back unmasked:
Approach
analysis.GetQuerySpanWithCatalog(stmt, *catalog.Catalog)— the sametrino/catalogtype completion already consumes;catalog.Viewgains aDefinitionfield. A FROM reference the catalog resolves to a view binds as a derived relation carrying its definition's resolved projection (computed recursively in the view's own catalog/schema context, cycle-guarded, memoized) — so named refs,SELECT *over views, views-over-views, and CTE-based definitions all resolve through the existing #286/#293 resolver machinery. The definition's relations joinAccessTables(qualified). A catalog-known base table binds with its catalog columns: stars over base tables and mixed base+derived joins expand to the exact projection, and relation column aliases (FROM customer AS c(i, p, …)) resolve.Safety: resolution stays additive (written refs always retained); a star expands only when provably width/order-correct; stale metadata (column-count mismatch with the definition) makes the view opaque instead of risking a width-wrong expansion; a nil catalog is byte-identical to
GetQuerySpan(entire existing suite passes unchanged).Verification
19 view-lineage regression tests (incl. cross-schema context, cycle termination + star opacity, table-shadows-view precedence, stale-metadata opacity, relation column aliases); full
trino/analysis+trino/catalogsuites green. (Thetrino/parseroracle-differential backtick failure is the pre-existing trackedlexer_oraclefollow-up, reproduced with this change stashed.)Cross-review
Codex adversarial pass: 4 findings — relation-column-alias lineage (P0), stale-metadata count mismatch (P0), partial-projection memoization under cycles (P2) all fixed; intermediate views in
AccessTableskept intentionally and documented (reading through a view accesses it). A focused re-verification pass confirmed the fixes sound.Consumer note (for the bytebase bump PR)
Bytebase wiring becomes: fill
catalog.View.Definitionwhen building the catalog it already builds, set the session context, and callGetQuerySpanWithCatalog. The base tables surfaced through views appear inAccessTables— base-table access checks then apply to queries through views; flagging for product review at bump time.🤖 Generated with Claude Code