Skip to content

bug: SQL prepare_table runs on setup in stead of drainΒ #3237

@kgpayne

Description

@kgpayne

Singer SDK Version

0.48.0

Is this a regression?

  • Yes

Python Version

3.12

Bug scope

Targets (data type handling, batching, SQL object generation, etc.)

Operating System

MacOS

Description

I think this might be a regression, but I'm not 100% sure. Current behavior:

  • Schema message recieved, SQLSink setup() called on Sink instantiation.
  • .setup() calls self.connector.prepare_table() which creates or updates the target schema. In the case of TargetLoadMethods.OVERWRITE (new target default 😬) the destination table is dropped and recreated.
  • If a new schema message is received, the existing Sink instance is retired to _sinks_to_clear and a new Sink is instantiated. This new Sink has .setup() called which triggers .setup() evolves or recreates the table using the new schema 😱 Records are held in _sinks_to_clear waiting for a drain_all event.
  • When drain_all is eventually called, sinks drain in order, starting with _sinks_to_clear first. However the target schema has now changed (possibly multiple times), and previously received records no longer conform. In the case of the test checking for multiple schema messages any messages inserted by a drain_one trigger will be erased when a new schema message arrives, leading to an incomplete sync πŸ›

If I am not mistaken, this means SQL targets currently don't support multiple schemas in a single stream safely 🚨

Link to Slack/Linen

No response

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions