Add integration tests to bulk spanner migration for checking MySQL data type support#3046
Conversation
Summary of ChangesHello @nmemond, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the integration test suite for bulk Spanner migrations by broadening the range of MySQL data types covered and introducing a dedicated test for migrations targeting PostgreSQL dialect Spanner instances. These changes aim to improve the robustness and reliability of the migration process by verifying data integrity across a wider array of data type conversions and different Spanner configurations. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
04b284c to
cd3ae7c
Compare
Also, add missing tables to PG dialect spanner schema
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3046 +/- ##
============================================
+ Coverage 50.76% 56.13% +5.36%
+ Complexity 5101 1693 -3408
============================================
Files 974 471 -503
Lines 59967 26662 -33305
Branches 6551 2805 -3746
============================================
- Hits 30445 14966 -15479
+ Misses 27378 10800 -16578
+ Partials 2144 896 -1248
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
In this particular case I might request to add a new test class than overload existing one. There could be cases where there's support missing on PG side for which we have support on MySQL side (or vice versa)
For example in current PR, PG_FLOAT4 is not yet supported.
Can we add a new test Class for MySQL to PG dialect test (with it's own copy of schemas)?
When you add a new test class, for now, you can skip PG_FLOAT4 and we can fix that separately.
Sure, I can split it out into its own test class. I'll still keep the |
@VardhanThigle I've split out the tests as requested, please have a look and let me know if there's anything else. Thanks! |
...urcedb-to-spanner/src/test/java/com/google/cloud/teleport/v2/templates/MySQLDataTypesIT.java
Outdated
Show resolved
Hide resolved
...-spanner/src/test/java/com/google/cloud/teleport/v2/templates/MySQLDataTypesPGDialectIT.java
Outdated
Show resolved
Hide resolved
VardhanThigle
left a comment
There was a problem hiding this comment.
Over all looks good to me.
A couple of comments to cleanup potentially unused code.
de1e4d2 to
9854b15
Compare
|
@VardhanThigle I see that the two data type tests I modified/added are failing, but I'm not able to access details about the actual failures (I can tell the job fails from the logs, but not why it failed). When I run these locally, they both pass without issue. Would it be possible to get the job logs so I can look into why these are failing? |
This is the stack trace I see on Dataflow worker. It's possible that we are having many tables and the sheer parallelization of the graph is reaching the connection limit on source. Can you try reducing the setting the number of max connections to a smaller value like 8 while spawnning a the dataflow job for your tests? The default is 160 per thread which is more for an IT scenario (but might be good for a production setting) |
Interesting, thanks for the stack trace. I'll try with your suggestion, it does sound like it could allow me to replicate the issue locally. |
…onnection limit of the MySQL DB
@VardhanThigle I went with a max connections value of 4 which was about as low as I could go without causing timeout issues trying to fetch connections from the pool itself. That said, when testing locally, I do still end up seeing quite a few connections coming into the DB from the job (almost 100), so I'm not sure if this will end up resolving the problem. Potentially we'd need to reduce the number of threads for the job itself? Or are there other levers we can adjust if it ends up failing still? Alternatively, there's potentially a connection leak, but I didn't notice anything obvious when I went through the code other than the three connections used for discovering the tables/indexes/schema (I'm pretty sure |
|
Could we please rebase this ? I wanted to try if the workaround in #3147 helps here. |
Done. |
…ta type support (GoogleCloudPlatform#3046) * Add missing data type mappings to data types integration test * Add data types test for bulk migration from MySQL to a Postgres dialect Spanner DB * Improve check for bit to string data type mapping Also, add missing tables to PG dialect spanner schema * Split PG dialect test into its own test class * Remove unused code * Reduce maxConnections to 4 in an attempt to avoid exceeding the max connection limit of the MySQL DB
This adds some missing data type mappings to the existing data type integration test, and also adds a test for a migration to a postgres dialect spanner instance.
Note that some of the type mappings fail to migrate as expected. The checks for those are still included for completeness, but they're commented out to avoid failing the tests.