Skip to content

Commit 86f8354

Browse files
authored
[YAML] Add documentation of YAML-defined YAML-providers to the site. (#33729)
1 parent fa8474a commit 86f8354

File tree

1 file changed

+104
-0
lines changed

1 file changed

+104
-0
lines changed

website/www/site/content/en/documentation/sdks/yaml-providers.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,110 @@ providers:
8282
We offer a [python provider starter project](https://github.com/apache/beam-starter-python-provider)
8383
that serves as a complete example for how to do this.
8484

85+
## YAML
86+
87+
New, re-usable transforms can be defined in YAML as well.
88+
This type of provider simply has a mapping of names to their YAML definitions.
89+
Jinja2 templatization of their string representations is used to parameterize
90+
them.
91+
92+
The `config_schema` section of the transform definition specifies what
93+
parameters are required (with their types) and the `body` section gives
94+
the implementation in terms of other YAML transforms.
95+
96+
```
97+
- type: yaml
98+
transforms:
99+
# Define the first transform of type "RaiseElementToPower"
100+
RaiseElementToPower:
101+
config_schema:
102+
properties:
103+
n: {type: integer}
104+
body:
105+
type: MapToFields
106+
config:
107+
language: python
108+
append: true
109+
fields:
110+
power: "element ** {{n}}"
111+
112+
# Define a second transform that produces consecutive integers.
113+
Range:
114+
config_schema:
115+
properties:
116+
end: {type: integer}
117+
# Setting this parameter lets this transform type be used as a source.
118+
requires_inputs: false
119+
body: |
120+
type: Create
121+
config:
122+
elements:
123+
{% for ix in range(end) %}
124+
- {{ix}}
125+
{% endfor %}
126+
```
127+
128+
Note that in this second example the `body` of Range is defined as a
129+
[block string literal](https://yaml-multiline.info/)
130+
to prevent any attempt by the system to parse the `{%` and `%}` pragmas used
131+
for control statements before a specialization with a concrete value for `end`
132+
is instantiated and the loop is expanded.
133+
134+
These could then be used in a pipeline as
135+
136+
```
137+
transforms:
138+
- type: Range
139+
config:
140+
end: 10
141+
- type: RaiseElementToPower
142+
input: Range
143+
config:
144+
n: 3
145+
...
146+
```
147+
148+
One can define composite transforms as well, e.g. in a provider listing one
149+
could have
150+
151+
```
152+
- type: yaml
153+
transforms:
154+
ConsecutivePowers:
155+
# This takes two parameters.
156+
config_schema:
157+
properties:
158+
end: {type: integer}
159+
n: {type: integer}
160+
161+
# It can be used as a source transform.
162+
requires_inputs: false
163+
164+
# The body uses the transforms defined above linked together in a chain.
165+
body: |
166+
type: chain
167+
transforms:
168+
- type: Range
169+
config:
170+
end: {{end}}
171+
- type: RaiseElementToPower
172+
config:
173+
n: {{n}}
174+
```
175+
176+
which allows one to use this whole fragment as
177+
178+
```
179+
type: ConsecutivePowers
180+
config:
181+
end: 10
182+
n: 3
183+
```
184+
185+
Note that YAML-defined transforms work better in a listing file than directly
186+
in the `providers` block of a pipeline file as pipeline files are always
187+
pre-processed with Jinja2 themselves which would necessitate double escaping.
188+
85189
## YAML Provider listing files
86190

87191
One can reference an external listings of providers in the yaml pipeline file

0 commit comments

Comments
 (0)