@@ -30,10 +30,26 @@ vend catalogues of schema transforms.
3030
3131## Java
3232
33- For example, you could build a jar that vends a
33+ Exposing transform in Java that can be used in a YAML pipeline consists of
34+ four main steps:
35+
36+ 1 . Defining the transformation itself as a
37+ [ PTransform] ( https://beam.apache.org/documentation/programming-guide/#composite-transforms )
38+ that consumes and produces zero or more [ schema'd PCollections] ( https://beam.apache.org/documentation/programming-guide/#creating-schemas ) .
39+ 2 . Exposing this transform via a
40+ [ SchemaTransformProvider] ( https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html )
41+ which provides an identifier used to refer to this transform later as well
42+ as metadata like a human-readable description and its configuration parameters.
43+ 3 . Building a Jar that contains these classes and vends them via the
44+ [ Service Loader] ( https://github.com/Polber/beam-yaml-xlang/blob/95abf0864e313232a89f3c9e57b950d0fb478979/src/main/java/org/example/ToUpperCaseTransformProvider.java#L30 )
45+ infrastructure.
46+ 4 . Writing a [ provider specification] ( https://beam.apache.org/documentation/sdks/yaml/#providers )
47+ that tells Beam YAML where to find this jar and what it contains.
48+
49+ If the transform is already exposed as a
3450[ cross language transform] ( https://beam.apache.org/documentation/sdks/python-multi-language-pipelines/ )
3551or [ schema transform] ( https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/transforms/SchemaTransformProvider.html )
36- and then use it in a transform as follows
52+ then steps 1-3 have been done for you. One then uses this transform as follows:
3753
3854```
3955pipeline:
@@ -56,13 +72,14 @@ pipeline:
5672providers:
5773 - type: javaJar
5874 config:
59- jar: /path/or/url/to/myExpansionService.jar
75+ jar: /path/or/url/to/myExpansionService.jar
6076 transforms:
61- MyCustomTransform: "urn:registered:in:expansion:service"
77+ MyCustomTransform: "urn:registered:in:expansion:service"
6278```
6379
64- A full example of how to build a java provider can be found
65- [ here] ( https://github.com/apache/beam-starter-java-provider ) .
80+ We provide a
81+ [ full cloneable example of how to build a java provider] ( https://github.com/apache/beam-starter-java-provider )
82+ that can be used to get started.
6683
6784## Python
6885
@@ -72,13 +89,27 @@ Arbitrary Python transforms can be provided as well, using the syntax
7289providers:
7390 - type: pythonPackage
7491 config:
75- packages:
76- - my_pypi_package>=version
77- - /path/to/local/package.zip
92+ packages:
93+ - my_pypi_package>=version
94+ - /path/to/local/package.zip
7895 transforms:
79- MyCustomTransform: "pkg.module.PTransformClassOrCallable"
96+ MyCustomTransform: "pkg.module.PTransformClassOrCallable"
8097```
8198
99+ which can then be used as
100+
101+ ```
102+ - type: MyCustomTransform
103+ config:
104+ num: 3
105+ arg: whatever
106+ ```
107+
108+ This will cause the dependencies to be installed before the transform is
109+ imported (via its given fully qualified name) and instantiated
110+ with the config values passed as keyword arguments (e.g. in this case
111+ ` pkg.module.PTransformClassOrCallable(num=3, arg="whatever") ` ).
112+
82113We offer a [ python provider starter project] ( https://github.com/apache/beam-starter-python-provider )
83114that serves as a complete example for how to do this.
84115
0 commit comments