Skip to content

Commit d799c95

Browse files
authored
Reference managed-io page in our docs (#34882)
1 parent 1caa1c2 commit d799c95

File tree

5 files changed

+33
-122
lines changed

5 files changed

+33
-122
lines changed

sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java

Lines changed: 2 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@
3131
import org.apache.beam.sdk.values.Row;
3232
import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions;
3333
import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Predicates;
34-
import org.apache.hadoop.conf.Configuration;
3534
import org.apache.iceberg.Table;
3635
import org.apache.iceberg.catalog.Catalog;
3736
import org.apache.iceberg.catalog.TableIdentifier;
@@ -82,117 +81,8 @@
8281
*
8382
* <h2>Configuration Options</h2>
8483
*
85-
* <table border="1" cellspacing="2">
86-
* <tr>
87-
* <td> <b>Parameter</b> </td> <td> <b>Type</b> </td> <td> <b>Description</b> </td>
88-
* </tr>
89-
* <tr>
90-
* <td> {@code table} </td> <td> {@code str} </td> <td> Required. A fully-qualified table identifier. You may also provide a
91-
* template to use dynamic destinations (see the `Dynamic Destinations` section below for details). </td>
92-
* </tr>
93-
* <tr>
94-
* <td> {@code catalog_name} </td> <td> {@code str} </td> <td> The name of the catalog. Defaults to {@code apache-beam-<VERSION>}. </td>
95-
* </tr>
96-
* <tr>
97-
* <td> {@code catalog_properties} </td> <td> {@code map<str, str>} </td> <td> A map of properties to be used when
98-
* constructing the Iceberg catalog. Required properties will depend on what catalog you are using, but
99-
* <a href="https://iceberg.apache.org/docs/latest/configuration/#catalog-properties">this list</a>
100-
* is a good starting point. </td>
101-
* </tr>
102-
* <tr>
103-
* <td> {@code config_properties} </td> <td> {@code map<str, str>} </td> <td> A map of properties
104-
* to instantiate the catalog's Hadoop {@link Configuration}. Required properties will depend on your catalog
105-
* implementation, but <a href="https://iceberg.apache.org/docs/latest/configuration/#hadoop-configuration">this list</a>
106-
* is a good starting point.
107-
* </tr>
108-
* </table>
109-
*
110-
* <h3>Sink-only Options</h3>
111-
*
112-
* <table border="1" cellspacing="1">
113-
* <tr>
114-
* <td> <b>Parameter</b> </td> <td> <b>Type</b> </td> <td> <b>Description</b> </td>
115-
* </tr>
116-
* <tr>
117-
* <td> {@code triggering_frequency_seconds} </td>
118-
* <td> {@code int} </td>
119-
* <td>Required for streaming writes. Roughly every
120-
* {@code triggering_frequency_seconds} duration, the sink will write records to data files and produce a table snapshot.
121-
* Generally, a higher value will produce fewer, larger data files.
122-
* </td>
123-
* </tr>
124-
* <tr>
125-
* <td>{@code drop}</td> <td>{@code list<str>}</td> <td>A list of fields to drop before writing to table(s).</td>
126-
* </tr>
127-
* <tr>
128-
* <td>{@code keep}</td> <td>{@code list<str>}</td> <td>A list of fields to keep, dropping the rest before writing to table(s).</td>
129-
* </tr>
130-
* <tr>
131-
* <td>{@code only}</td> <td>{@code str}</td> <td>A nested record field that should be the only thing written to table(s).</td>
132-
* </tr>
133-
* </table>
134-
*
135-
* <h3>Source-only Options</h3>
136-
*
137-
* <h4>ICEBERG_CDC Source options</h4>
138-
*
139-
* <table border="1" cellspacing="1">
140-
* <tr>
141-
* <td> <b>Parameter</b> </td> <td> <b>Type</b> </td> <td> <b>Description</b> </td>
142-
* </tr>
143-
* <tr>
144-
* <td> {@code streaming} </td>
145-
* <td> {@code boolean} </td>
146-
* <td>
147-
* Enables streaming reads. The source will continuously poll for snapshots forever.
148-
* </td>
149-
* </tr>
150-
* <tr>
151-
* <td> {@code poll_interval_seconds} </td>
152-
* <td> {@code int} </td>
153-
* <td>
154-
* The interval at which to scan the table for new snapshots. Defaults to 60 seconds. Only applicable for streaming reads.
155-
* </td>
156-
* </tr>
157-
* <tr>
158-
* <td> {@code from_snapshot} </td>
159-
* <td> {@code long} </td>
160-
* <td> Starts reading from this snapshot ID (inclusive).
161-
* </td>
162-
* </tr>
163-
* <tr>
164-
* <td> {@code to_snapshot} </td>
165-
* <td> {@code long} </td>
166-
* <td> Reads up to this snapshot ID (inclusive). By default, batch reads will read up to the latest snapshot (inclusive),
167-
* while streaming reads will continue polling for new snapshots forever.
168-
* </td>
169-
* </tr>
170-
* <tr>
171-
* <td> {@code from_timestamp} </td>
172-
* <td> {@code long} </td>
173-
* <td> Starts reading from the earliest snapshot (inclusive) created after this timestamp (in milliseconds).
174-
* </td>
175-
* </tr>
176-
* <tr>
177-
* <td> {@code to_timestamp} </td>
178-
* <td> {@code long} </td>
179-
* <td> Reads up to the latest snapshot (inclusive) created before this timestamp (in milliseconds). By default, batch reads will read up to the latest snapshot (inclusive),
180-
* while streaming reads will continue polling for new snapshots forever.
181-
* </td>
182-
* </tr>
183-
* <tr>
184-
* <td> {@code starting_strategy} </td>
185-
* <td> {@code str} </td>
186-
* <td>
187-
* The source's starting strategy. Valid options are:
188-
* <ul>
189-
* <li>{@code earliest}: starts reading from the earliest snapshot</li>
190-
* <li>{@code latest}: starts reading from the latest snapshot</li>
191-
* </ul>
192-
* <p>Defaults to {@code earliest} for batch, and {@code latest} for streaming.
193-
* </td>
194-
* </tr>
195-
* </table>
84+
* Please check the <a href="https://beam.apache.org/documentation/io/managed-io/">Managed IO
85+
* configuration page</a>
19686
*
19787
* <h3>Beam Rows</h3>
19888
*

sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,8 @@
4646
* <h3>Available transforms</h3>
4747
*
4848
* <p>This API currently supports two operations: {@link Managed#read} and {@link Managed#write}.
49-
* Each one enumerates the available transforms in a {@code TRANSFORMS} map.
49+
* Please check the <a href="https://beam.apache.org/documentation/io/managed-io/">Managed IO
50+
* configuration page</a> to see available transforms and config options.
5051
*
5152
* <h3>Building a Managed turnkey transform</h3>
5253
*

sdks/python/apache_beam/transforms/managed.py

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,11 @@
2323
also replace the transform with something entirely different if it chooses to.
2424
By default, however, the specified transform will remain unchanged.
2525
26+
Available transforms
27+
====================
28+
Please check the Managed IO configuration page:
29+
https://beam.apache.org/documentation/io/managed-io/
30+
2631
Using Managed Transforms
2732
========================
2833
Managed turnkey transforms have a defined configuration and can be built using
@@ -50,19 +55,10 @@
5055
beam.managed.KAFKA,
5156
config_url="path/to/config.yaml")
5257
53-
Available transforms
54-
====================
55-
Available transforms are:
56-
57-
- **Kafka Read and Write**
58-
- **Iceberg Read and Write**
5958
6059
**Note:** inputs and outputs need to be PCollection(s) of Beam
6160
:py:class:`apache_beam.pvalue.Row` elements.
6261
63-
**Note:** Today, all managed transforms are essentially cross-language
64-
transforms, and Java's ManagedSchemaTransform is used under the hood.
65-
6662
Runner specific features
6763
========================
6864
Google Cloud Dataflow supports additional management features for `managed`

sdks/python/gen_managed_doc.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,18 @@
6969
its latest SDK version, automatically applying bug fixes and new features (no
7070
manual updates or user intervention required!)
7171
72+
## Supported SDKs
73+
74+
The Managed API is directly accessible through the
75+
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/managed/Managed.html)
76+
and
77+
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.managed.html)
78+
SDKs.
79+
80+
Additionally, some SDKs use the Managed API internally. For example, the Iceberg connector
81+
used in [Beam YAML](https://beam.apache.org/releases/yamldoc/current/#writetoiceberg)
82+
and Beam SQL is invoked via the Managed API under the hood.
83+
7284
"""
7385
_MANAGED_RESOURCES_DIR = os.path.join(
7486
PROJECT_ROOT, 'sdks', 'java', 'managed', 'src', 'main', 'resources')

website/www/site/content/en/documentation/io/managed-io.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,18 @@ For example, the DataflowRunner can seamlessly upgrade a Managed transform to
3232
its latest SDK version, automatically applying bug fixes and new features (no
3333
manual updates or user intervention required!)
3434

35+
## Supported SDKs
36+
37+
The Managed API is directly accessible through the
38+
[Java](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/managed/Managed.html)
39+
and
40+
[Python](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.managed.html)
41+
SDKs.
42+
43+
Additionally, some SDKs use the Managed API internally. For example, the Iceberg connector
44+
used in [Beam YAML](https://beam.apache.org/releases/yamldoc/current/#writetoiceberg)
45+
and Beam SQL is invoked via the Managed API under the hood.
46+
3547
## Available Configurations
3648

3749
<i>Note: required configuration fields are <strong>bolded</strong>.</i>

0 commit comments

Comments
 (0)