Skip to content

Commit 1b20d23

Browse files
docs: update generated documentation (#3264)
Co-authored-by: liferoad <7833268+liferoad@users.noreply.github.com>
1 parent 9a0bbda commit 1b20d23

File tree

5 files changed

+47
-12
lines changed

5 files changed

+47
-12
lines changed

v2/bigquery-to-bigtable/README_BigQuery_to_Bigtable.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ on [Metadata Annotations](https://github.com/GoogleCloudPlatform/DataflowTemplat
4040
* **bigtableBulkWriteLatencyTargetMs**: The latency target of Bigtable in milliseconds for latency-based throttling.
4141
* **bigtableBulkWriteMaxRowKeyCount**: The maximum number of row keys in a Bigtable batch write operation.
4242
* **bigtableBulkWriteMaxRequestSizeBytes**: The maximum bytes to include per Bigtable batch write operation.
43+
* **bigtableBulkWriteFlowControl**: When set to true, enables bulk write flow control which will useserver's signal to throttle the writes. Defaults to: false.
4344

4445

4546

@@ -155,6 +156,7 @@ export BIGTABLE_WRITE_PROJECT_ID=<bigtableWriteProjectId>
155156
export BIGTABLE_BULK_WRITE_LATENCY_TARGET_MS=<bigtableBulkWriteLatencyTargetMs>
156157
export BIGTABLE_BULK_WRITE_MAX_ROW_KEY_COUNT=<bigtableBulkWriteMaxRowKeyCount>
157158
export BIGTABLE_BULK_WRITE_MAX_REQUEST_SIZE_BYTES=<bigtableBulkWriteMaxRequestSizeBytes>
159+
export BIGTABLE_BULK_WRITE_FLOW_CONTROL=false
158160

159161
gcloud dataflow flex-template run "bigquery-to-bigtable-job" \
160162
--project "$PROJECT" \
@@ -180,7 +182,8 @@ gcloud dataflow flex-template run "bigquery-to-bigtable-job" \
180182
--parameters "bigtableWriteProjectId=$BIGTABLE_WRITE_PROJECT_ID" \
181183
--parameters "bigtableBulkWriteLatencyTargetMs=$BIGTABLE_BULK_WRITE_LATENCY_TARGET_MS" \
182184
--parameters "bigtableBulkWriteMaxRowKeyCount=$BIGTABLE_BULK_WRITE_MAX_ROW_KEY_COUNT" \
183-
--parameters "bigtableBulkWriteMaxRequestSizeBytes=$BIGTABLE_BULK_WRITE_MAX_REQUEST_SIZE_BYTES"
185+
--parameters "bigtableBulkWriteMaxRequestSizeBytes=$BIGTABLE_BULK_WRITE_MAX_REQUEST_SIZE_BYTES" \
186+
--parameters "bigtableBulkWriteFlowControl=$BIGTABLE_BULK_WRITE_FLOW_CONTROL"
184187
```
185188

186189
For more information about the command, please check:
@@ -222,6 +225,7 @@ export BIGTABLE_WRITE_PROJECT_ID=<bigtableWriteProjectId>
222225
export BIGTABLE_BULK_WRITE_LATENCY_TARGET_MS=<bigtableBulkWriteLatencyTargetMs>
223226
export BIGTABLE_BULK_WRITE_MAX_ROW_KEY_COUNT=<bigtableBulkWriteMaxRowKeyCount>
224227
export BIGTABLE_BULK_WRITE_MAX_REQUEST_SIZE_BYTES=<bigtableBulkWriteMaxRequestSizeBytes>
228+
export BIGTABLE_BULK_WRITE_FLOW_CONTROL=false
225229

226230
mvn clean package -PtemplatesRun \
227231
-DskipTests \
@@ -230,7 +234,7 @@ mvn clean package -PtemplatesRun \
230234
-Dregion="$REGION" \
231235
-DjobName="bigquery-to-bigtable-job" \
232236
-DtemplateName="BigQuery_to_Bigtable" \
233-
-Dparameters="readIdColumn=$READ_ID_COLUMN,timestampColumn=$TIMESTAMP_COLUMN,skipNullValues=$SKIP_NULL_VALUES,inputTableSpec=$INPUT_TABLE_SPEC,outputDeadletterTable=$OUTPUT_DEADLETTER_TABLE,query=$QUERY,useLegacySql=$USE_LEGACY_SQL,queryLocation=$QUERY_LOCATION,queryTempDataset=$QUERY_TEMP_DATASET,KMSEncryptionKey=$KMSENCRYPTION_KEY,bigtableRpcAttemptTimeoutMs=$BIGTABLE_RPC_ATTEMPT_TIMEOUT_MS,bigtableRpcTimeoutMs=$BIGTABLE_RPC_TIMEOUT_MS,bigtableAdditionalRetryCodes=$BIGTABLE_ADDITIONAL_RETRY_CODES,bigtableWriteInstanceId=$BIGTABLE_WRITE_INSTANCE_ID,bigtableWriteTableId=$BIGTABLE_WRITE_TABLE_ID,bigtableWriteColumnFamily=$BIGTABLE_WRITE_COLUMN_FAMILY,bigtableWriteAppProfile=$BIGTABLE_WRITE_APP_PROFILE,bigtableWriteProjectId=$BIGTABLE_WRITE_PROJECT_ID,bigtableBulkWriteLatencyTargetMs=$BIGTABLE_BULK_WRITE_LATENCY_TARGET_MS,bigtableBulkWriteMaxRowKeyCount=$BIGTABLE_BULK_WRITE_MAX_ROW_KEY_COUNT,bigtableBulkWriteMaxRequestSizeBytes=$BIGTABLE_BULK_WRITE_MAX_REQUEST_SIZE_BYTES" \
237+
-Dparameters="readIdColumn=$READ_ID_COLUMN,timestampColumn=$TIMESTAMP_COLUMN,skipNullValues=$SKIP_NULL_VALUES,inputTableSpec=$INPUT_TABLE_SPEC,outputDeadletterTable=$OUTPUT_DEADLETTER_TABLE,query=$QUERY,useLegacySql=$USE_LEGACY_SQL,queryLocation=$QUERY_LOCATION,queryTempDataset=$QUERY_TEMP_DATASET,KMSEncryptionKey=$KMSENCRYPTION_KEY,bigtableRpcAttemptTimeoutMs=$BIGTABLE_RPC_ATTEMPT_TIMEOUT_MS,bigtableRpcTimeoutMs=$BIGTABLE_RPC_TIMEOUT_MS,bigtableAdditionalRetryCodes=$BIGTABLE_ADDITIONAL_RETRY_CODES,bigtableWriteInstanceId=$BIGTABLE_WRITE_INSTANCE_ID,bigtableWriteTableId=$BIGTABLE_WRITE_TABLE_ID,bigtableWriteColumnFamily=$BIGTABLE_WRITE_COLUMN_FAMILY,bigtableWriteAppProfile=$BIGTABLE_WRITE_APP_PROFILE,bigtableWriteProjectId=$BIGTABLE_WRITE_PROJECT_ID,bigtableBulkWriteLatencyTargetMs=$BIGTABLE_BULK_WRITE_LATENCY_TARGET_MS,bigtableBulkWriteMaxRowKeyCount=$BIGTABLE_BULK_WRITE_MAX_ROW_KEY_COUNT,bigtableBulkWriteMaxRequestSizeBytes=$BIGTABLE_BULK_WRITE_MAX_REQUEST_SIZE_BYTES,bigtableBulkWriteFlowControl=$BIGTABLE_BULK_WRITE_FLOW_CONTROL" \
234238
-f v2/bigquery-to-bigtable
235239
```
236240

@@ -296,6 +300,7 @@ resource "google_dataflow_flex_template_job" "bigquery_to_bigtable" {
296300
# bigtableBulkWriteLatencyTargetMs = "<bigtableBulkWriteLatencyTargetMs>"
297301
# bigtableBulkWriteMaxRowKeyCount = "<bigtableBulkWriteMaxRowKeyCount>"
298302
# bigtableBulkWriteMaxRequestSizeBytes = "<bigtableBulkWriteMaxRequestSizeBytes>"
303+
# bigtableBulkWriteFlowControl = "false"
299304
}
300305
}
301306
```

v2/datastream-to-sql/README_Cloud_Datastream_to_SQL.md

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -54,12 +54,17 @@ on [Metadata Annotations](https://github.com/GoogleCloudPlatform/DataflowTemplat
5454
* **databaseName**: The name of the SQL database to connect to. The default value is `postgres`.
5555
* **defaultCasing**: A Toggle for table casing behavior. For example,(ie.LOWERCASE = mytable -> mytable, UPPERCASE = mytable -> MYTABLECAMEL = my_table -> myTable, SNAKE = myTable -> my_table. Defaults to: LOWERCASE.
5656
* **columnCasing**: A toggle for target column name casing. LOWERCASE (default): my_column -> my_column. UPPERCASE: my_column -> MY_COLUMN. CAMEL: my_column -> myColumn. SNAKE: myColumn -> my_column.
57-
* **schemaMap**: A map of key/values used to dictate schema name changes (ie. old_name:new_name,CaseError:case_error). Defaults to empty.
57+
* **schemaMap**: A map of key/values used to dictate schema and table name changes. Examples: Schema to schema (SCHEMA1:SCHEMA2), Table to table (SCHEMA1.table1:SCHEMA2.TABLE1), or multiple mappings using the pipe '|' delimiter (e.g. schema1.source:schema2.target|schema3.source:schema4.target). Defaults to empty.
5858
* **customConnectionString**: Optional connection string which will be used instead of the default database string.
5959
* **numThreads**: Determines key parallelism of Format to DML step, specifically, the value is passed into Reshuffle.withNumBuckets. Defaults to: 100.
6060
* **databaseLoginTimeout**: The timeout in seconds for database login attempts. This helps prevent connection hangs when multiple workers try to connect simultaneously.
61-
* **datastreamSourceType**: Override the source type detection for Datastream CDC data. When specified, this value will be used instead of deriving the source type from the read_method field. Valid values include 'mysql', 'postgresql', 'oracle', etc. This parameter is useful when the read_method field contains 'cdc' and the actual source type cannot be determined automatically.
6261
* **orderByIncludesIsDeleted**: Order by configurations for data should include prioritizing data which is not deleted. Defaults to: false.
62+
* **datastreamSourceType**: Override the source type detection for Datastream CDC data. When specified, this value will be used instead of deriving the source type from the read_method field. Valid values include 'mysql', 'postgresql', 'oracle', etc. This parameter is useful when the read_method field contains 'cdc' and the actual source type cannot be determined automatically.
63+
* **deadLetterQueueDirectory**: The path that Dataflow uses to write the dead-letter queue output. This path must not be in the same path as the Datastream file output. Defaults to `empty`.
64+
* **dlqRetryMinutes**: The number of minutes between DLQ Retries. Defaults to `10`.
65+
* **dlqMaxRetries**: The maximum number of times to retry a failed record from the DLQ before marking it as a permanent failure. Defaults to 5.
66+
* **schemaCacheRefreshMinutes**: The number of minutes to cache table schemas. Defaults to 1440 (24 hours).
67+
* **runMode**: This is the run mode type, whether regular or with retryDLQ. Defaults to: regular.
6368

6469

6570

@@ -172,8 +177,13 @@ export SCHEMA_MAP=""
172177
export CUSTOM_CONNECTION_STRING=""
173178
export NUM_THREADS=100
174179
export DATABASE_LOGIN_TIMEOUT=<databaseLoginTimeout>
175-
export DATASTREAM_SOURCE_TYPE=<datastreamSourceType>
176180
export ORDER_BY_INCLUDES_IS_DELETED=false
181+
export DATASTREAM_SOURCE_TYPE=<datastreamSourceType>
182+
export DEAD_LETTER_QUEUE_DIRECTORY=""
183+
export DLQ_RETRY_MINUTES=10
184+
export DLQ_MAX_RETRIES=5
185+
export SCHEMA_CACHE_REFRESH_MINUTES=1440
186+
export RUN_MODE=regular
177187

178188
gcloud dataflow flex-template run "cloud-datastream-to-sql-job" \
179189
--project "$PROJECT" \
@@ -197,8 +207,13 @@ gcloud dataflow flex-template run "cloud-datastream-to-sql-job" \
197207
--parameters "customConnectionString=$CUSTOM_CONNECTION_STRING" \
198208
--parameters "numThreads=$NUM_THREADS" \
199209
--parameters "databaseLoginTimeout=$DATABASE_LOGIN_TIMEOUT" \
210+
--parameters "orderByIncludesIsDeleted=$ORDER_BY_INCLUDES_IS_DELETED" \
200211
--parameters "datastreamSourceType=$DATASTREAM_SOURCE_TYPE" \
201-
--parameters "orderByIncludesIsDeleted=$ORDER_BY_INCLUDES_IS_DELETED"
212+
--parameters "deadLetterQueueDirectory=$DEAD_LETTER_QUEUE_DIRECTORY" \
213+
--parameters "dlqRetryMinutes=$DLQ_RETRY_MINUTES" \
214+
--parameters "dlqMaxRetries=$DLQ_MAX_RETRIES" \
215+
--parameters "schemaCacheRefreshMinutes=$SCHEMA_CACHE_REFRESH_MINUTES" \
216+
--parameters "runMode=$RUN_MODE"
202217
```
203218

204219
For more information about the command, please check:
@@ -237,8 +252,13 @@ export SCHEMA_MAP=""
237252
export CUSTOM_CONNECTION_STRING=""
238253
export NUM_THREADS=100
239254
export DATABASE_LOGIN_TIMEOUT=<databaseLoginTimeout>
240-
export DATASTREAM_SOURCE_TYPE=<datastreamSourceType>
241255
export ORDER_BY_INCLUDES_IS_DELETED=false
256+
export DATASTREAM_SOURCE_TYPE=<datastreamSourceType>
257+
export DEAD_LETTER_QUEUE_DIRECTORY=""
258+
export DLQ_RETRY_MINUTES=10
259+
export DLQ_MAX_RETRIES=5
260+
export SCHEMA_CACHE_REFRESH_MINUTES=1440
261+
export RUN_MODE=regular
242262

243263
mvn clean package -PtemplatesRun \
244264
-DskipTests \
@@ -247,7 +267,7 @@ mvn clean package -PtemplatesRun \
247267
-Dregion="$REGION" \
248268
-DjobName="cloud-datastream-to-sql-job" \
249269
-DtemplateName="Cloud_Datastream_to_SQL" \
250-
-Dparameters="inputFilePattern=$INPUT_FILE_PATTERN,gcsPubSubSubscription=$GCS_PUB_SUB_SUBSCRIPTION,inputFileFormat=$INPUT_FILE_FORMAT,streamName=$STREAM_NAME,rfcStartDateTime=$RFC_START_DATE_TIME,dataStreamRootUrl=$DATA_STREAM_ROOT_URL,databaseType=$DATABASE_TYPE,databaseHost=$DATABASE_HOST,databasePort=$DATABASE_PORT,databaseUser=$DATABASE_USER,databasePassword=$DATABASE_PASSWORD,databaseName=$DATABASE_NAME,defaultCasing=$DEFAULT_CASING,columnCasing=$COLUMN_CASING,schemaMap=$SCHEMA_MAP,customConnectionString=$CUSTOM_CONNECTION_STRING,numThreads=$NUM_THREADS,databaseLoginTimeout=$DATABASE_LOGIN_TIMEOUT,datastreamSourceType=$DATASTREAM_SOURCE_TYPE,orderByIncludesIsDeleted=$ORDER_BY_INCLUDES_IS_DELETED" \
270+
-Dparameters="inputFilePattern=$INPUT_FILE_PATTERN,gcsPubSubSubscription=$GCS_PUB_SUB_SUBSCRIPTION,inputFileFormat=$INPUT_FILE_FORMAT,streamName=$STREAM_NAME,rfcStartDateTime=$RFC_START_DATE_TIME,dataStreamRootUrl=$DATA_STREAM_ROOT_URL,databaseType=$DATABASE_TYPE,databaseHost=$DATABASE_HOST,databasePort=$DATABASE_PORT,databaseUser=$DATABASE_USER,databasePassword=$DATABASE_PASSWORD,databaseName=$DATABASE_NAME,defaultCasing=$DEFAULT_CASING,columnCasing=$COLUMN_CASING,schemaMap=$SCHEMA_MAP,customConnectionString=$CUSTOM_CONNECTION_STRING,numThreads=$NUM_THREADS,databaseLoginTimeout=$DATABASE_LOGIN_TIMEOUT,orderByIncludesIsDeleted=$ORDER_BY_INCLUDES_IS_DELETED,datastreamSourceType=$DATASTREAM_SOURCE_TYPE,deadLetterQueueDirectory=$DEAD_LETTER_QUEUE_DIRECTORY,dlqRetryMinutes=$DLQ_RETRY_MINUTES,dlqMaxRetries=$DLQ_MAX_RETRIES,schemaCacheRefreshMinutes=$SCHEMA_CACHE_REFRESH_MINUTES,runMode=$RUN_MODE" \
251271
-f v2/datastream-to-sql
252272
```
253273

@@ -310,8 +330,13 @@ resource "google_dataflow_flex_template_job" "cloud_datastream_to_sql" {
310330
# customConnectionString = ""
311331
# numThreads = "100"
312332
# databaseLoginTimeout = "<databaseLoginTimeout>"
313-
# datastreamSourceType = "<datastreamSourceType>"
314333
# orderByIncludesIsDeleted = "false"
334+
# datastreamSourceType = "<datastreamSourceType>"
335+
# deadLetterQueueDirectory = ""
336+
# dlqRetryMinutes = "10"
337+
# dlqMaxRetries = "5"
338+
# schemaCacheRefreshMinutes = "1440"
339+
# runMode = "regular"
315340
}
316341
}
317342
```

v2/googlecloud-to-googlecloud/README_Stream_GCS_Text_to_BigQuery_Flex.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ on [Metadata Annotations](https://github.com/GoogleCloudPlatform/DataflowTemplat
4141

4242
* **outputDeadletterTable**: Table for messages that failed to reach the output table. If a table doesn't exist, it is created during pipeline execution. If not specified, `<outputTableSpec>_error_records` is used. For example, `<PROJECT_ID>:<DATASET_NAME>.<TABLE_NAME>`.
4343
* **useStorageWriteApiAtLeastOnce**: This parameter takes effect only if `Use BigQuery Storage Write API` is enabled. If enabled the at-least-once semantics will be used for Storage Write API, otherwise exactly-once semantics will be used. Defaults to: false.
44-
* **useStorageWriteApi**: If `true`, the pipeline uses the BigQuery Storage Write API (https://cloud.google.com/bigquery/docs/write-api). The default value is `false`. For more information, see Using the Storage Write API (https://beam.apache.org/documentation/io/built-in/google-bigquery/#storage-write-api).
44+
* **useStorageWriteApi**: If true, the pipeline uses the BigQuery Storage Write API (https://cloud.google.com/bigquery/docs/write-api). The default value is `false`. For more information, see Using the Storage Write API (https://beam.apache.org/documentation/io/built-in/google-bigquery/#storage-write-api).
4545
* **numStorageWriteApiStreams**: When using the Storage Write API, specifies the number of write streams. If `useStorageWriteApi` is `true` and `useStorageWriteApiAtLeastOnce` is `false`, then you must set this parameter. Defaults to: 0.
4646
* **storageWriteApiTriggeringFrequencySec**: When using the Storage Write API, specifies the triggering frequency, in seconds. If `useStorageWriteApi` is `true` and `useStorageWriteApiAtLeastOnce` is `false`, then you must set this parameter.
4747
* **pythonExternalTextTransformGcsPath**: The Cloud Storage path pattern for the Python code containing your user-defined functions. For example, `gs://your-bucket/your-function.py`.

v2/googlecloud-to-googlecloud/README_Stream_GCS_Text_to_BigQuery_Xlang.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ on [Metadata Annotations](https://github.com/GoogleCloudPlatform/DataflowTemplat
3939

4040
* **outputDeadletterTable**: Table for messages that failed to reach the output table. If a table doesn't exist, it is created during pipeline execution. If not specified, `<outputTableSpec>_error_records` is used. For example, `<PROJECT_ID>:<DATASET_NAME>.<TABLE_NAME>`.
4141
* **useStorageWriteApiAtLeastOnce**: This parameter takes effect only if `Use BigQuery Storage Write API` is enabled. If enabled the at-least-once semantics will be used for Storage Write API, otherwise exactly-once semantics will be used. Defaults to: false.
42-
* **useStorageWriteApi**: If `true`, the pipeline uses the BigQuery Storage Write API (https://cloud.google.com/bigquery/docs/write-api). The default value is `false`. For more information, see Using the Storage Write API (https://beam.apache.org/documentation/io/built-in/google-bigquery/#storage-write-api).
42+
* **useStorageWriteApi**: If true, the pipeline uses the BigQuery Storage Write API (https://cloud.google.com/bigquery/docs/write-api). The default value is `false`. For more information, see Using the Storage Write API (https://beam.apache.org/documentation/io/built-in/google-bigquery/#storage-write-api).
4343
* **numStorageWriteApiStreams**: When using the Storage Write API, specifies the number of write streams. If `useStorageWriteApi` is `true` and `useStorageWriteApiAtLeastOnce` is `false`, then you must set this parameter. Defaults to: 0.
4444
* **storageWriteApiTriggeringFrequencySec**: When using the Storage Write API, specifies the triggering frequency, in seconds. If `useStorageWriteApi` is `true` and `useStorageWriteApiAtLeastOnce` is `false`, then you must set this parameter.
4545
* **pythonExternalTextTransformGcsPath**: The Cloud Storage path pattern for the Python code containing your user-defined functions. For example, `gs://your-bucket/your-function.py`.

0 commit comments

Comments
 (0)