Ea stability fixes#3251
Ea stability fixes#3251pratickchokhani wants to merge 10 commits intoGoogleCloudPlatform:release_2025-10-28-01_RC00from
Conversation
* Revert "deps: update Python dependencies (GoogleCloudPlatform#2956)" This reverts commit 797bd8f. * Update pom and requirements * Fix tests * Update python requirements * fix test for bqio breaking change --------- Co-authored-by: Vitaly Terentyev <vitaly.terentyev@akvelon.com>
…CloudPlatform#3024) [b/458068271](https://b.corp.google.com/issues/458068271) [b/458070941](https://b.corp.google.com/issues/458070941) ## Problem: Whenever a row has column with NULL value, the reverse replication fails with the following error messages: 1. AssignShardId step in dataflow: ```Error fetching shard Id column: Illegal call to getter of null value``` 2. DLQ entry: ```"error_message":"No shard identified for the record"``` ## Fix: marshalSpannerValues calls getter functions to get value of the column, but these function throw a NullPointerException when the value is NULL. This is handled incorrectly and causes the above error. So the fix is to catch a NULL value at the beginning of the function itself. ## Tests: 1. Unit test updated - failed without the fix with the expected error and passed with it. 2. Integration test updated to include NULL values in the row. ### Dataflow job with container built on fixed code. Template Container built in gs://ea-functional-tests/templates/flex/Spanner_to_SourceDb Ran [dataflow job](https://pantheon.corp.google.com/dataflow/jobs/us-central1/2025-11-26_02_33_03-2924712538454216688;graphView=0?project=span-cloud-ck-testing-external&e=PangolinKitchenLaunch::PangolinKitchenEnabled&mods=logs_tg_staging&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))) with above container. #### 1. double datatype SQL Query: column skill of type double is kept NULL, column variance of type double is non-NULL ``` INSERT INTO ut_scl_squad (ddrKey, gameSpaceId, ownerPersId, teammateIndex, teammatePersId, `variance`, lastUpdateTime) VALUES (3002, 17613, 172377253, 202, 1269841562, 1234.56, 2720290009); DELETE FROM ut_scl_squad WHERE ddrKey = 3002 AND gameSpaceId = 17613 AND ownerPersId = 172377253; ``` #### 2. timestamp datatype SQL Query: - column createdTime of type timestamp is kept non-NULL, column squadName of type varchar is NULL ``` INSERT INTO ut_showoff (ddrkey, showoffId, userId, createdTime, `count`) VALUES (2002, 2002, -285444577, TIMESTAMP('2025-01-01T12:00:00Z'), 2444035295); DELETE FROM ut_showoff WHERE ddrkey = 2002 AND showoffId = 2002; ``` - column createdTime of type timestamp is kept NULL ``` INSERT INTO ut_showoff (ddrkey, showoffId, userId) VALUES (202, 202, -28577); DELETE FROM ut_showoff WHERE ddrkey = 202 AND showoffId = 202; ``` #### 3. Reserved Keywords SQL Query: column content of type varbinary is kept NULL ``` INSERT INTO sedges (ddrkey, created, `from`, `to`, `type`, `value`, flags) VALUES (2001, 2001, 'nodeA', 'nodeB', 'edgeType1', 'edgeValue1', -1326272220); DELETE FROM sedges WHERE ddrkey = 2001 AND created = 2001; ``` #### 4. INTERLEAVING - INTERLEAVE IN PARENT ON DELETE CASCADE: on deleting parent on spanner, both parent and child records from sql were also deleted. - INTERLEAVE IN: on deleting parent on spanner, the parent row was also deleted from sql. #### 5. Regression testing: Custom sharding logic - Flow when theres no migration_shard_id column and pipeline uses ddrkey value to compute shard is not affected by this change.
…oudPlatform#3073) * Switch to AvroCoder * Spotless
…Platform#3115) * setting default gf * changed grouping factor to 500 * spotless * readibility * added comment for the change --------- Co-authored-by: Sandeep Mishra <iamsandeep@google.com>
Summary of ChangesHello @pratickchokhani, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing the stability, data integrity, and performance of several Google Cloud Dataflow templates. It includes upgrading dependencies, improving data type handling, and optimizing memory usage. The changes span across multiple templates, including those for JDBC to BigQuery, Spanner to SourceDB, and DataStream MongoDB to Firestore. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
…oSpanner or Live migration template (GoogleCloudPlatform#3133) * Correcting BIT column handling in Live migration template * Correcting Bulk migration template * Adding binary_col: * Correcting test * Addressing comments * Correcting DataStreamToSpannerDDLIT * Correcting SeparateShadowTableDDLIT * Correcting CassandraAllDataTypesIT * Handling null in Bulk+live retry * Fixing spotless * Correcting assertions for null in MySQLAllDataTypesCustomTransformationsBulkAndLiveFT * Format changes * Correcting unit test
…n-textual datatype Alternative implementation: instead of manually checking in each toDatatype function, we could add this check to containsValue function that is used by all datatypes Reason for rejection: a user can pass "NULL" string purposely to a string column - this should not be converted to null by my logic. containsValue function is called in method toString and thus this conversion cannot be added to the common function and needs to be added for each datatype seperately wherever relevant. This would not change the existing behaviour for string data type. It would only change NULL handling for other datatypes
e3d90c5 to
4c428bf
Compare
4c428bf to
936ee73
Compare
7961f6c to
723d31a
Compare
No description provided.