Skip to content

Conversation

@claudevdm
Copy link
Collaborator

@claudevdm claudevdm commented Mar 25, 2025

Vendors cloudpickle 3.1.1 and adds it to apache_beam codebase.
Removes the no-longer-necessary cloudpickle dependency.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@claudevdm claudevdm force-pushed the vendored-cloudpickle branch from 4108e07 to 34443de Compare March 25, 2025 17:50
@claudevdm claudevdm requested a review from tvalentyn March 25, 2025 20:32
@claudevdm claudevdm marked this pull request as ready for review March 25, 2025 20:32
@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@claudevdm
Copy link
Collaborator Author

@tvalentyn can you please take a look?

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label python.
R: @damccorm for label build.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this looks reasonable to me, but I will defer to @tvalentyn

@@ -0,0 +1,29 @@
Copyright (c) 2012-now, CloudPickle developers and contributors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how we've done this in the past, but I think it probably makes sense to colocate the license with the source. That will also ensure it actually gets packed into the Python container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we need to just add it to https://github.com/apache/beam/blob/master/LICENSE directly

Copy link
Contributor

@tvalentyn tvalentyn Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked what is included in containers images by searching find . -name *LICENSE*:

Seeing:

  • ./opt/apache/beam/third_party_licenses/cloudpickle/LICENSE -- this would probably disappear if cloudpickle is not a dependency, unless we modify sdks/python/container/license_scripts/manual_licenses to copy it, which may not be necessary if we add the license in one of the two below places:

  • ./opt/apache/beam/LICENSE

  • ./opt/apache/beam/LICENSE.python -- we could add LICENSE.cloudpickle as proposed in this PR

I don't have preference which file to add it to, we can see what is simpler and easier to import internally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Licenses are not imported from this repo internally, they are manually created, so from an import point of view I don't think it matters where we put it.

Does LICENSE.cloudpickle work for building containers? Or should I put it in LICENSE.python instead?

Copy link
Contributor

@tvalentyn tvalentyn Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does LICENSE.cloudpickle work for building containers?

yes, it works, we can check that it is included in the container build. for example, by running smth like

gradlew :sdks:python:container:py310:docker
docker run --rm -it --entrypoint=/bin/bash apache/beam_python3.10_sdk:2.65.0.dev

and inspecting the image content.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to follow up in another PR if this needs another change.

@tvalentyn tvalentyn merged commit 468f5cf into apache:master Apr 1, 2025
97 checks passed
@tvalentyn
Copy link
Contributor

#21298

liferoad pushed a commit to liferoad/beam that referenced this pull request Apr 4, 2025
* Add vendored cloudpickle.

* Remove references to third party cloudpickle.

* Fix precommits.

* Fix isort lint error.

* Fix extra space.

* Remove extra quote

* Change vendored version to 3.1.1

---------

Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants