[docs] Add new Python multi-lang quickstart using the SchemaTransform framework#33360

Closed

ahmedabu98 wants to merge 2 commits intoapache:masterfrom

ahmedabu98:python_xlang_quickstart

Contributor

ahmedabu98 commented Dec 11, 2024 •

edited

Loading

Part of #33358

Adding a new multi-lang quickstart and marking the old one as "legacy"


          new python multi-lang quickstart

79534af

github-actions bot added the website label

ahmedabu98 marked this pull request as ready for review

December 12, 2024 19:29

ahmedabu98 changed the title ~~Add new Python multi-lang quickstart using the SchemaTransform framework~~ [docs] Add new Python multi-lang quickstart using the SchemaTransform framework

Contributor

github-actions bot commented Dec 12, 2024

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label website.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

github-actions bot added the Next Action: Reviewers label

ahmedabu98 mentioned this pull request

[Task]: Add new Python SchemaTransform multi-language documentation #33358

Closed

17 tasks

ahmedabu98 linked an issue

that may be closed by this pull request

[Task]: Add new Python SchemaTransform multi-language documentation #33358

Closed

17 tasks


          update example code

Contributor

github-actions bot commented Dec 26, 2024

Reminder, please take a look at this pr: @kennknowles

github-actions bot added the slow-review label

Contributor

github-actions bot commented Dec 31, 2024

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @damccorm for label website.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions bot removed the slow-review label

damccorm reviewed

View reviewed changes

website/www/site/content/en/documentation/programming-guide.md

		#### 13.2.2. Using cross-language transforms in a Python pipeline

		For Beam versions 2.60.0+, please follow [this guide](sdks/python-custom-multi-language-pipelines-guide.md#use-the-portable-transform-in-a-python-pipeline) instead.

Contributor

damccorm Jan 2, 2025

Does this section actually need this disclaimer? I think consuming schema transforms is basically the same, right/nothing has changed for this section?

website/www/site/content/en/documentation/programming-guide.md


		#### 13.1.1. Creating cross-language Java transforms

		For Beam versions 2.60.0+, please follow [this guide](sdks/python-custom-multi-language-pipelines-guide.md) instead.

Contributor

damccorm Jan 2, 2025

Does this apply to the whole section or just 13.1.1.2? Do we need to recommend away from JavaExternalTransform for cases where it works?

Contributor

damccorm Jan 2, 2025

Also, should we update this section to recommend the new way (even if its just linking to the full doc) by default, and just link to the legacy page for <2.60.0 instead of leaving all the content here?

website/www/site/content/en/documentation/sdks/python-multi-language-pipelines-2.md


		## Create a cross-language transform

		Here's a Java transform provider, [ExtractWordsProvider](https://github.com/apache/beam/blob/master/examples/multi-language/src/main/java/org/apache/beam/examples/multilanguage/schematransforms/ExtractWordsProvider.java), that is uniquely identified with the URN `"beam:schematransform:org.apache.beam:extract_words:v1"`. Given a Configuration object, it will provide a transform:

Contributor

damccorm Jan 2, 2025

Could you describe what the URN does? (in this context allows the transform to be identified across the language barrier)

website/www/site/content/en/documentation/sdks/python-multi-language-pipelines-2.md

+              Beam uses this configuration to generate a Python transform with the following signature:
+              ```python
+              Extract(drop=["foo", "bar"])

Contributor

damccorm Jan 2, 2025

Suggested change

      
            Extract(drop=["foo", "bar"])
          
            class Extract():
          
               def __init__(self, drop: List[str])

Saying the existing code snippet is a signature is not quite right. Thoughts on providing the full Python class definition? This might be a bit clearer.

Alternately, we could change Beam uses this configuration to generate a Python transform with the following signature: to Beam uses this configuration to generate a Python transform which can be instantiated like:.

website/www/site/content/en/documentation/sdks/python-multi-language-pipelines-2.md

+              Extract(drop=["foo", "bar"])
+              ```
+              The transform can be any implementation of your choice, as long as it meets the requirements of a [SchemaTransform](../glossary.md#schematransform). For this example, the transform does the following:

Contributor

damccorm Jan 2, 2025

I think we need to similarly describe what a valid configuration is above. I assume not all field types are valid?

website/www/site/content/en/documentation/sdks/python-multi-language-pipelines-2.md


		When building a job for a multi-language pipeline, Beam uses an [expansion service](../glossary#expansion-service) to expand [composite transforms](../glossary#composite-transform). You must have at least one expansion service per remote SDK.

		Before running a multi-language pipeline, you need to build an expansion service that can access your Java transform. It’s often easier to create a single shaded JAR that contains both. Both Python and Java dependencies will be staged for the runner by the Python SDK.

Contributor

damccorm Jan 2, 2025

It’s often easier to create a single shaded JAR that contains both

I'm not sure what this is saying - both of what?

Contributor

damccorm Jan 2, 2025

It might be nice to include an example command or additional info that shows how you can do this as well

website/www/site/content/en/documentation/sdks/python-multi-language-pipelines-2.md

+              Then, initialize the `ExternalTransformProvider` with your expansion service. This can take two parameters:
+              * `expansion_services`: an expansion service, or list of expansion services
+              * `urn_pattern`: (optional) a regex pattern to match valid transforms

Contributor

damccorm Jan 2, 2025

Suggested change

      
            * `urn_pattern`: (optional) a regex pattern to match valid transforms
          
            * `urn_pattern`: (optional) a regex pattern to match valid transforms. If this is not provided...

It would be good to add information on what this does/what happens if it is missing

website/www/site/content/en/documentation/sdks/python-multi-language-pipelines-2.md


		### Run with direct runner

		In the following command, `input1` is a file containing lines of text:

Contributor

damccorm Jan 2, 2025

Probably worth calling out that the expansion service needs to be started first (here and below in the Dataflow section)

Contributor

github-actions bot commented Jan 10, 2025

Reminder, please take a look at this pr: @damccorm

github-actions bot added the slow-review label

Contributor

damccorm commented Jan 10, 2025

waiting on author

github-actions bot added Next Action: Author and removed Next Action: Reviewers slow-review labels

Contributor

jrmccluskey commented Feb 4, 2025

This PR is still listed on the 2.63.0 milestone. Is this a release blocker?

Contributor

damccorm commented Feb 4, 2025

I don't think it should be since the website is versioned independently of the release. @ahmedabu98 I'll remove the blocker, feel free to comment/add it back if I'm wrong

Contributor

github-actions bot commented Apr 6, 2025

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

github-actions bot added the stale label

Contributor

github-actions bot commented Apr 13, 2025

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

github-actions bot closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Next Action: Author stale website