Skip to content

Conversation

@damccorm
Copy link
Contributor

@damccorm damccorm commented Oct 1, 2025

Adds a pipeline option to auto-replace GBEK with encrypted GBEK in Java. To support this, makes the following changes:

  1. Adds the pipeline option in PipelineOptions.java
  2. Updates Secret.java and GcpSecret.java to allow them to consume these secrets and to make them more testable.
  3. Updates GroupByEncryptedKey so that it can take in a custom GBK-like transform. This allows us to just pass through the original GBK transform with all parameters filled out instead of trying to pass through all options
  4. Updates GroupByKey to automatically shell out to GroupByEncryptedKey when the pipeline option is present
  5. Updates GroupByKeyTranslation.java‎ to only use the GBK urn when the GBK is not being overriden by GBEK. This will prevent runner side replacement
  6. Dataflow has a custom GBK implementation used by Redistribute. We also plumb the same changes from (4) and (5) through to that transform in DataflowGroupByKey.java‎
  7. Adds tests

I'd recommend reviewing things in that order, I think it will help make sense of the PR.

Java version of #36321

Part of #36214

@damccorm damccorm changed the base branch from users/damccorm/java-gbek to master October 1, 2025 14:28
@github-actions github-actions bot added the build label Oct 1, 2025
@damccorm damccorm mentioned this pull request Oct 3, 2025
3 tasks
@damccorm damccorm changed the title [WIP] Add pipeline option to force GBEK (Java) Add pipeline option to force GBEK (Java) Oct 4, 2025
@damccorm damccorm marked this pull request as ready for review October 5, 2025 11:08
@github-actions
Copy link
Contributor

github-actions bot commented Oct 5, 2025

Assigning reviewers:

R: @Abacn for label java.
R: @liferoad for label build.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@damccorm
Copy link
Contributor Author

damccorm commented Oct 6, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a great feature addition for enhancing security. The implementation looks solid, with good separation of concerns and extensibility. I've made a few suggestions to improve documentation and code clarity. One larger point is about the new integration tests: there's some duplicated test setup logic for GCP Secret Manager in GroupByKeyTest and GroupByKeyIT. It would be beneficial to refactor this into a shared utility or a JUnit @Rule. Additionally, the new tests in GroupByKeyTest that require a live GCP environment would be better placed in GroupByKeyIT to maintain a clear separation between unit and integration tests. Overall, excellent work!

@damccorm damccorm mentioned this pull request Oct 6, 2025
3 tasks
@damccorm
Copy link
Contributor Author

damccorm commented Oct 6, 2025

Test failures are all unrelated

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, had a pass for main codes

@damccorm damccorm merged commit c8df4da into master Oct 7, 2025
29 of 31 checks passed
@damccorm damccorm deleted the users/damccorm/enforce-java-gbek branch October 7, 2025 22:28
@Abacn
Copy link
Contributor

Abacn commented Oct 9, 2025

Not sure what happened underlying, but this appears breaking https://github.com/apache/beam/actions/workflows/beam_PostCommit_Java_PVR_Spark3_Streaming.yml (#34207)

last good run: https://github.com/apache/beam/actions/runs/18327770129

first failing run: https://github.com/apache/beam/actions/runs/18333678979

only differs by one commit (c8df4da) that is this one

@damccorm
Copy link
Contributor Author

damccorm commented Oct 9, 2025

Not sure what happened underlying, but this appears breaking https://github.com/apache/beam/actions/workflows/beam_PostCommit_Java_PVR_Spark3_Streaming.yml (#34207)

last good run: https://github.com/apache/beam/actions/runs/18327770129

first failing run: https://github.com/apache/beam/actions/runs/18333678979

only differs by one commit (c8df4da) that is this one

Ack - kicked off a debugging run in a CI env here - #36454 - with this/subsequent gbek commits reverted. Assuming that succeeds I'll go from there.

Its very unclear what in this PR would have caused this failure, but I agree it is suspicious

EDIT: Fixed by #36479 (test environment was getting polluted somehow)

damccorm added a commit that referenced this pull request Oct 9, 2025
damccorm added a commit that referenced this pull request Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants