Skip to content

Conversation

@aasthabharill
Copy link
Member

@aasthabharill aasthabharill commented Dec 1, 2025

b/458064173

Problem

We create some new columns to hold metadata in the shadow tables that are created. These shadow tables also have the Primary Key columns from the data table. When the data table has columns of the same name as the metadata columns we create, it gets overwritten because of which we get data type conversion errors. For eg. a primary key column "timestamp TIMESTAMP" gets overwritten to "timestamp INT64".

Fix

To ensure that we don't have these kind of clashes, we now dynamically compute the name of the shadow column by checking it against the list of primary key column names. In case a column with the same name is found, we add "shadow_" as a prefix iteratively till we get a unique column name for the shadow table.

Testing:

  • Built container with changes: gs://ea-functional-tests/templates-aastha-2025-12-09/flex/Cloud_Datastream_to_Spanner
  • Confirmed in DDL of Shadow table that original "timestamp" column of type TIMESTAMP is present along with new columns: shadow_timestamp, log_file and log_position
  • Ran a live migration job with above template and ran INSERT, UPDATE and DELETE statements containing "timestamp TIMESTAMP" column as Primary Key in data table - and these passed without errors

@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

❌ Patch coverage is 96.96970% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.20%. Comparing base (f48580d) to head (3faf25e).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...rt/v2/templates/datastream/ChangeEventContext.java 90.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3035      +/-   ##
============================================
+ Coverage     50.39%   54.20%   +3.80%     
+ Complexity     5021     2157    -2864     
============================================
  Files           970      488     -482     
  Lines         59606    28151   -31455     
  Branches       6507     2946    -3561     
============================================
- Hits          30039    15259   -14780     
+ Misses        27441    11954   -15487     
+ Partials       2126      938    -1188     
Components Coverage Δ
spanner-templates 71.82% <74.76%> (+1.22%) ⬆️
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 80.06% <92.75%> (+0.05%) ⬆️
spanner-live-reverse-replication 77.79% <67.60%> (+0.29%) ⬆️
spanner-bulk-migration 88.21% <0.00%> (-0.04%) ⬇️
Files with missing lines Coverage Δ
...mplates/datastream/ChangeEventSequenceFactory.java 68.42% <ø> (-7.58%) ⬇️
.../templates/datastream/MySqlChangeEventContext.java 100.00% <100.00%> (ø)
...templates/datastream/MySqlChangeEventSequence.java 85.07% <100.00%> (+0.69%) ⬆️
...templates/datastream/OracleChangeEventContext.java 100.00% <100.00%> (ø)
...emplates/datastream/OracleChangeEventSequence.java 81.81% <100.00%> (+0.86%) ⬆️
...mplates/datastream/PostgresChangeEventContext.java 100.00% <100.00%> (ø)
...plates/datastream/PostgresChangeEventSequence.java 84.90% <100.00%> (+0.59%) ⬆️
...eport/v2/templates/spanner/ShadowTableCreator.java 89.13% <100.00%> (+2.64%) ⬆️
...rt/v2/templates/datastream/ChangeEventContext.java 91.30% <90.00%> (-1.56%) ⬇️

... and 507 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aasthabharill aasthabharill marked this pull request as ready for review December 2, 2025 10:54
@aasthabharill aasthabharill requested a review from a team as a code owner December 2, 2025 10:54
@pull-request-size pull-request-size bot added size/L and removed size/M labels Dec 8, 2025
VardhanThigle
VardhanThigle previously approved these changes Dec 10, 2025
Copy link
Contributor

@VardhanThigle VardhanThigle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

VardhanThigle
VardhanThigle previously approved these changes Dec 11, 2025
VardhanThigle
VardhanThigle previously approved these changes Dec 12, 2025
Copy link
Contributor

@shreyakhajanchi shreyakhajanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please merge once the tests pass

@aasthabharill aasthabharill merged commit fa4c196 into GoogleCloudPlatform:main Dec 15, 2025
21 checks passed
@aasthabharill aasthabharill deleted the live-timestamp-to-long branch December 17, 2025 06:37
aasthabharill added a commit to aasthabharill/DataflowTemplates that referenced this pull request Dec 17, 2025
…ogleCloudPlatform#3035)

[b/458064173](https://b.corp.google.com/issues/458064173)

#### Problem
We create some new columns to hold metadata in the shadow tables that are created. These shadow tables also have the Primary Key columns from the data table. When the data table has columns of the same name as the metadata columns we create, it gets overwritten because of which we get data type conversion errors. For eg. a primary key column "timestamp TIMESTAMP" gets overwritten to "timestamp INT64".

#### Fix
To ensure that we don't have these kind of clashes, we now dynamically compute the name of the shadow column by checking it against the list of primary key column names. In case a column with the same name is found, we add "shadow_" as a prefix iteratively till we get a unique column name for the shadow table. 

### Testing:
- Built container with changes: gs://ea-functional-tests/templates-aastha-2025-12-09/flex/Cloud_Datastream_to_Spanner
-  Confirmed in DDL of Shadow table that original "timestamp" column of type TIMESTAMP is present along with new columns: shadow_timestamp, log_file and log_position
- Ran a live migration job with above template and ran INSERT, UPDATE and DELETE statements containing "timestamp TIMESTAMP" column as Primary Key in data table - and these passed without errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants