[yaml] Adding Spanner IO Providers for Beam YAML#31987
[yaml] Adding Spanner IO Providers for Beam YAML#31987damccorm merged 21 commits intoapache:masterfrom
Conversation
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Polber
left a comment
There was a problem hiding this comment.
Thanks Reeba! I left some comments, but the overall structure looks really good
...io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/MutationUtils.java
Outdated
Show resolved
Hide resolved
...orm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...orm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...orm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...orm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...rm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...rm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...rm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...rm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
|
R: @damccorm |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
Polber
left a comment
There was a problem hiding this comment.
Added some comments related to newly added testing framework
.../java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/providers/ErrorHandling.java
Outdated
Show resolved
Hide resolved
...io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/MutationUtils.java
Outdated
Show resolved
Hide resolved
...orm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
Polber
left a comment
There was a problem hiding this comment.
Couple more small changes. Also, make sure to address the unresolved conversations. Thanks!
...orm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
Polber
left a comment
There was a problem hiding this comment.
Lint errors:
************* Module apache_beam.io.gcp.spanner_wrapper
apache_beam/io/gcp/spanner_wrapper.py:20:0: W0611: Unused apache_beam imported as beam (unused-import)
************* Module apache_beam.yaml.integration_tests
apache_beam/yaml/integration_tests.py:50:0: W0404: Reimport 'contextlib' (imported line 20) (reimported)
apache_beam/yaml/integration_tests.py:50:0: C0413: Import "import contextlib" should be placed at the top of the module (wrong-import-position)
apache_beam/yaml/integration_tests.py:51:0: W0404: Reimport 'uuid' (imported line 27) (reimported)
apache_beam/yaml/integration_tests.py:51:0: C0413: Import "import uuid" should be placed at the top of the module (wrong-import-position)
apache_beam/yaml/integration_tests.py:52:0: W0404: Reimport 'logging' (imported line 24) (reimported)
apache_beam/yaml/integration_tests.py:52:0: C0413: Import "import logging" should be placed at the top of the module (wrong-import-position)
apache_beam/yaml/integration_tests.py:53:0: C0413: Import "from google.cloud import spanner" should be placed at the top of the module (wrong-import-position)
apache_beam/yaml/integration_tests.py:50:0: C0412: Imports from package contextlib are not grouped (ungrouped-imports)
apache_beam/yaml/integration_tests.py:51:0: C0412: Imports from package uuid are not grouped (ungrouped-imports)
apache_beam/yaml/integration_tests.py:52:0: C0412: Imports from package logging are not grouped (ungrouped-imports)
apache_beam/yaml/integration_tests.py:53:0: W0611: Unused spanner imported from google.cloud (unused-import)
Make modification I added in comments then run:
yapf --in-place --parallel --recursive apache_beam
I also recommend following steps in https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #31987 +/- ##
============================================
- Coverage 57.34% 57.15% -0.19%
Complexity 1474 1474
============================================
Files 964 965 +1
Lines 153341 153435 +94
Branches 1076 1076
============================================
- Hits 87933 87699 -234
- Misses 63204 63537 +333
+ Partials 2204 2199 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Polber
left a comment
There was a problem hiding this comment.
Try seeing if making these changes fixes test errors
...orm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...orm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerReadSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
...rm/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerWriteSchemaTransformProvider.java
Outdated
Show resolved
Hide resolved
1. Removed serialiazability from ErrorHandling.java 2. Removed double map definitions from MutationUtils.java 3. Added checkNotNull in spanner write provider 4. Modified some variables in spanner wrapper 5. Change instance id in integration tests
import retry
1. replace checknotnull with checkargument in spanner read provider 2. use the correct table name (tmp_table) in integration test
1. Added serializable to error handling 2. Corrected validation methods in spanner read 3. Added retry import and removed default project name in spanner wrapper 4. Corrected instance and database names in spanner integration test 5. Corrected table name in query
damccorm
left a comment
There was a problem hiding this comment.
Reran the failing jobs to see if we can get them green, but they do seem unrelated to this change
| 'google-cloud-core>=2.0.0,<3', | ||
| 'google-cloud-bigtable>=2.19.0,<3', | ||
| 'google-cloud-spanner>=3.0.0,<4', | ||
| 'google-cloud-spanner>=3.0.0,<3.48', |
There was a problem hiding this comment.
There were two checks that were failing and both had the same underlying error:
ModuleNotFoundError: No module named 'grpc_interceptor'
|
Seems like the google-cloud-spanner version specified here conflicts with the version installed in the containers. https://github.com/apache/beam/blob/master/sdks/python/setup.py#L444 At least one test suite is failing due to this. cc: @tvalentyn |
|
cc @Polber |
|
@damccorm was working on rebuilding those - not sure the status |
|
Looks like it is under review still: #32411 |
|
Yep, was just waiting on approval - just merged and it should be back to green |
* Add Spanner IO providers to YAML SDK * add handling logic for more datatypes * delete examples * minor changes * minor change * add integration test * add docs * minor change * minor changes 1. Removed serialiazability from ErrorHandling.java 2. Removed double map definitions from MutationUtils.java 3. Added checkNotNull in spanner write provider 4. Modified some variables in spanner wrapper 5. Change instance id in integration tests * Update spanner_wrapper.py import retry * minor changes 1. replace checknotnull with checkargument in spanner read provider 2. use the correct table name (tmp_table) in integration test * minor changes 1. Added serializable to error handling 2. Corrected validation methods in spanner read 3. Added retry import and removed default project name in spanner wrapper 4. Corrected instance and database names in spanner integration test 5. Corrected table name in query * formatting * Update SpannerWriteSchemaTransformProvider.java * correct lint failures * correct lint failures * correct lint failures * Update SpannerReadSchemaTransformProvider.java * correct lint failures * Update SpannerWriteSchemaTransformProvider.java * spanner version update
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.