Ea stability fixes by pratickchokhani · Pull Request #3251 · GoogleCloudPlatform/DataflowTemplates

pratickchokhani · 2026-01-23T09:28:34Z

No description provided.

* Revert "deps: update Python dependencies (GoogleCloudPlatform#2956)" This reverts commit 797bd8f. * Update pom and requirements * Fix tests * Update python requirements * fix test for bqio breaking change --------- Co-authored-by: Vitaly Terentyev <vitaly.terentyev@akvelon.com>

…CloudPlatform#3024) [b/458068271](https://b.corp.google.com/issues/458068271) [b/458070941](https://b.corp.google.com/issues/458070941) ## Problem: Whenever a row has column with NULL value, the reverse replication fails with the following error messages: 1. AssignShardId step in dataflow: ```Error fetching shard Id column: Illegal call to getter of null value``` 2. DLQ entry: ```"error_message":"No shard identified for the record"``` ## Fix: marshalSpannerValues calls getter functions to get value of the column, but these function throw a NullPointerException when the value is NULL. This is handled incorrectly and causes the above error. So the fix is to catch a NULL value at the beginning of the function itself. ## Tests: 1. Unit test updated - failed without the fix with the expected error and passed with it. 2. Integration test updated to include NULL values in the row. ### Dataflow job with container built on fixed code. Template Container built in gs://ea-functional-tests/templates/flex/Spanner_to_SourceDb Ran [dataflow job](https://pantheon.corp.google.com/dataflow/jobs/us-central1/2025-11-26_02_33_03-2924712538454216688;graphView=0?project=span-cloud-ck-testing-external&e=PangolinKitchenLaunch::PangolinKitchenEnabled&mods=logs_tg_staging&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))) with above container. #### 1. double datatype SQL Query: column skill of type double is kept NULL, column variance of type double is non-NULL ``` INSERT INTO ut_scl_squad (ddrKey, gameSpaceId, ownerPersId, teammateIndex, teammatePersId, `variance`, lastUpdateTime) VALUES (3002, 17613, 172377253, 202, 1269841562, 1234.56, 2720290009); DELETE FROM ut_scl_squad WHERE ddrKey = 3002 AND gameSpaceId = 17613 AND ownerPersId = 172377253; ``` #### 2. timestamp datatype SQL Query: - column createdTime of type timestamp is kept non-NULL, column squadName of type varchar is NULL ``` INSERT INTO ut_showoff (ddrkey, showoffId, userId, createdTime, `count`) VALUES (2002, 2002, -285444577, TIMESTAMP('2025-01-01T12:00:00Z'), 2444035295); DELETE FROM ut_showoff WHERE ddrkey = 2002 AND showoffId = 2002; ``` - column createdTime of type timestamp is kept NULL ``` INSERT INTO ut_showoff (ddrkey, showoffId, userId) VALUES (202, 202, -28577); DELETE FROM ut_showoff WHERE ddrkey = 202 AND showoffId = 202; ``` #### 3. Reserved Keywords SQL Query: column content of type varbinary is kept NULL ``` INSERT INTO sedges (ddrkey, created, `from`, `to`, `type`, `value`, flags) VALUES (2001, 2001, 'nodeA', 'nodeB', 'edgeType1', 'edgeValue1', -1326272220); DELETE FROM sedges WHERE ddrkey = 2001 AND created = 2001; ``` #### 4. INTERLEAVING - INTERLEAVE IN PARENT ON DELETE CASCADE: on deleting parent on spanner, both parent and child records from sql were also deleted. - INTERLEAVE IN: on deleting parent on spanner, the parent row was also deleted from sql. #### 5. Regression testing: Custom sharding logic - Flow when theres no migration_shard_id column and pipeline uses ddrkey value to compute shard is not affected by this change.

…oudPlatform#3073) * Switch to AvroCoder * Spotless

…Platform#3115) * setting default gf * changed grouping factor to 500 * spotless * readibility * added comment for the change --------- Co-authored-by: Sandeep Mishra <iamsandeep@google.com>

gemini-code-assist · 2026-01-23T09:30:03Z

Summary of Changes

Hello @pratickchokhani, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the stability, data integrity, and performance of several Google Cloud Dataflow templates. It includes upgrading dependencies, improving data type handling, and optimizing memory usage. The changes span across multiple templates, including those for JDBC to BigQuery, Spanner to SourceDB, and DataStream MongoDB to Firestore.

Highlights

Beam Version Upgrade: The pull request upgrades the Apache Beam version from 2.68.0 to 2.69.0 across multiple Python and Java files, ensuring the project utilizes the latest features and stability improvements from the Beam ecosystem.
SpannerIO Enhancement: The SpannerIO.java file is modified to include OutputBuilderSupplier to enhance the output mechanism for mutation groups, ensuring proper windowing and timestamping.
SourceDB Template Improvements: Several changes in the v2/spanner-to-sourcedb directory focus on improving data type handling, null value management, and code cleanup for the Spanner to SourceDB template.
JDBC Template Update: The JdbcToBigQueryYamlIT.java file is updated to enforce NOT NULL constraint on the AGE column in JDBC schema, ensuring data integrity.
Memory Pressure Reduction: The grouping factor in SpannerWriter.java is reduced from 1000 to 500 to alleviate memory pressure, with minimal impact on throughput.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

…oSpanner or Live migration template (GoogleCloudPlatform#3133) * Correcting BIT column handling in Live migration template * Correcting Bulk migration template * Adding binary_col: * Correcting test * Addressing comments * Correcting DataStreamToSpannerDDLIT * Correcting SeparateShadowTableDDLIT * Correcting CassandraAllDataTypesIT * Handling null in Bulk+live retry * Fixing spotless * Correcting assertions for null in MySQLAllDataTypesCustomTransformationsBulkAndLiveFT * Format changes * Correcting unit test

…n-textual datatype Alternative implementation: instead of manually checking in each toDatatype function, we could add this check to containsValue function that is used by all datatypes Reason for rejection: a user can pass "NULL" string purposely to a string column - this should not be converted to null by my logic. containsValue function is called in method toString and thus this conversion cannot be added to the common function and needs to be added for each datatype seperately wherever relevant. This would not change the existing behaviour for string data type. It would only change NULL handling for other datatypes

…orm#3269)

shreyakhajanchi and others added 5 commits January 23, 2026 13:12

fixing retry count (GoogleCloudPlatform#2949)

7798fbe

chore: Switch to AvroCoder for Reverse replication template (GoogleCl…

0318844

…oudPlatform#3073) * Switch to AvroCoder * Spotless

Changed the grouping factor to 500 from the default 1000 (GoogleCloud…

723d264

…Platform#3115) * setting default gf * changed grouping factor to 500 * spotless * readibility * added comment for the change --------- Co-authored-by: Sandeep Mishra <iamsandeep@google.com>

pratickchokhani added the bug-fix label Jan 23, 2026

pull-request-size bot added the size/XXL label Jan 23, 2026

darshan-sj and others added 2 commits January 27, 2026 11:25

pratickchokhani force-pushed the ea-stability-fixes branch 5 times, most recently from e3d90c5 to 4c428bf Compare January 27, 2026 09:57

Commented irrelevant asserts for this cherry picks

936ee73

pratickchokhani force-pushed the ea-stability-fixes branch from 4c428bf to 936ee73 Compare January 27, 2026 11:06

fix: Version mismatch in integration tests.

723d31a

pratickchokhani force-pushed the ea-stability-fixes branch from 7961f6c to 723d31a Compare January 28, 2026 10:26

Making LTs run on templates built from current code (GoogleCloudPlatf…

97209a6

…orm#3269)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ea stability fixes#3251

Ea stability fixes#3251
pratickchokhani wants to merge 10 commits intoGoogleCloudPlatform:release_2025-10-28-01_RC00from
pratickchokhani:ea-stability-fixes

pratickchokhani commented Jan 23, 2026

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

pratickchokhani commented Jan 23, 2026

Uh oh!

gemini-code-assist bot commented Jan 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants