Draft
Conversation
19e75b0 to
e244ce7
Compare
…ource configs Previously, `generate_missing_materialization_configs` delegated resource config generation to the generic `stub_config` path, which always derived x-schema-name from the 2nd-to-last collection name component regardless of the materialization's configured strategy. Now resource stubs are created via `update_materialization_resource_spec`, which populates x-schema-name and x-collection-name according to the materialization's `target_naming` and `source` settings. This means `flowctl generate` produces resource configs that match what the runtime and auto-discover would produce for the same materialization.
7ca065d to
d8ae42e
Compare
Adds `flowctl raw migrate-target-naming` to analyze all materializations and determine the appropriate `TargetNamingStrategy` for each, based on the legacy `source.targetNaming` field and endpoint configuration. For each materialization, the tool: * Looks up x-schema-name support from `connector_tags.resource_spec_schema` * Maps the legacy `TargetNaming` enum to the new `TargetNamingStrategy` (`MatchSourceStructure`, `SingleSchema`, `PrefixTableNames`) * Detects the endpoint schema from connector config, falling back to the common schema across existing resource paths * Analyzes each binding to determine whether filling in x-schema-name would change the resource path (requiring manual intervention) or target a different database schema * Falls back from `MatchSourceStructure` to `SingleSchema` when collection names don't match existing resource path schemas * Handles Snowflake's backwards-compat behavior where 1-element paths are preserved when the schema matches the endpoint default The report classifies each materialization as MIGRATE (safe to auto-migrate), MANUAL (needs human intervention due to resource path changes or ambiguous schema), or various SKIP reasons. Disabled tasks with synthetic binding-N resource paths are classified as MIGRATE since they'll backfill on re-enable. Disabled materializations without a built spec are skipped entirely.
d8ae42e to
48e7f93
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the migration script for target naming that I talk about in #2780. I decided to frame it as a new, temporary
flowctl rawsubcommand that analyzes existing materializations and determines the correctTargetNamingStrategybased on its currentsource.targetNaming, endpoint configuration, and built resource paths. By default it prints a dry-run report. With--execute, it publishes the changes one materialization at a time.Migration classification
Every materialization is classified into one of:
target_namingand per-bindingx-schema-namecan be set automatically without causing unintended backfilling.[table]) to 2-element ([schema, table]), or we can't determine the correct schema automatically (no endpoint config schema, no consistent schema in resource paths).x-schema-name(no schema pointer in its resource spec).Strategy selection rules
TargetNamingStrategyis derived fromsource.targetNaming:source.targetNamingWithSchemaMatchSourceStructurePrefixSchemaPrefixTableNames { schema, skip_common_defaults: false }PrefixNonDefaultSchemaPrefixTableNames { schema, skip_common_defaults: true }NoSchemaSingleSchema { schema }MatchSourceStructure, falling back toSingleSchemaif existing bindings conflictThe no-source-capture case tries
MatchSourceStructurebecause that was the de facto behavior beforetargetNamingexisted (update_materialization_resource_specunconditionally derived x-schema-name from collection names). This is a slight deviation from the original migration proposal, which grouped no-source-capture withNoSchema->SingleSchema. The fallback toSingleSchemameans the end result is the same when collection names don't match path schemas.For strategies that require a schema value, the tool resolves it from (in order):
PUBLIC).Filling x-schema-name on existing bindings
targetNamingcontrols how future bindings (from auto-discover) get their schema and table names. Existing bindings needx-schema-namefilled in separately to match where their data actually lives.For many existing bindings, the strategy-derived schema matches the actual schema in the resource path, and x-schema-name is simply set to that value. But for some bindings, the two diverge: a binding created before
x-schema-nameexisted, or via a code path that didn't populate it, would have been placed in whatever schema the endpoint config specified, which may not match what the strategy would derive from the collection name.When the customer explicitly set
source.targetNaming, the tool preserves their strategy for future bindings but fills in the actual schema (from the built resource path) on existing bindings where the strategy-derived value would conflict. The report flags these as(actual; strategy would produce "..." for new bindings).When no
source.targetNamingwas set and a binding's collection-derived schema doesn't match its resource path schema, the tool falls back toSingleSchemawith the resolved endpoint schema. If the endpoint schema also doesn't match the resource path schemas, the task is marked asMANUAL.Snowflake compatibility mode handling
materialize-snowflakeuniquely produces 1-element resource paths ([table]) when the binding's schema matches the endpoint-config's default, and 2-element paths ([schema, table]) otherwise. The migration tool mirrors the connector's logic to determine whether settingx-schema-namewould preserve or change the resource path. When the endpoint config has no explicit schema, the tool assumes Snowflake's default ofPUBLIC.Disabled materializations
Disabled materializations with a built spec are analyzed normally. Disabled materializations without a built spec are skipped entirely, as they're old enough that re-enabling them at this point would almost certainly require a backfill anyway.
Execute mode
With
--execute, the tool publishes eachMIGRATEmaterialization individually:last_pub_id(optimistic concurrency)targetNamingon the materializationx-schema-nameon bindings that are missing itdraft_specs