Skip to content

Commit 83c9b2f

Browse files
authored
Merge pull request #2904 from ClickHouse/ks/object-storage-clickpieps
Updates Object Storage ClickPipe docs
2 parents 8143856 + 012c80a commit 83c9b2f

File tree

1 file changed

+14
-7
lines changed

1 file changed

+14
-7
lines changed

docs/en/integrations/data-ingestion/clickpipes/object-storage.md

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ import S3SVG from "../../images/logos/amazon_s3_logo.svg";
77
import GCSSVG from "../../images/logos/gcs.svg";
88

99
# Integrating Object Storage with ClickHouse Cloud
10+
Object Storage ClickPipes provide a simple and resilient way to ingest data from Amazon S3 and Google Cloud Storage into ClickHouse Cloud. Both one-time and continuous ingestion are supported with exactly-once semantics.
11+
12+
1013
## Prerequisite
1114
You have familiarized yourself with the [ClickPipes intro](./index.md).
1215

@@ -77,27 +80,31 @@ You can also map [virtual columns](../../sql-reference/table-functions/s3#virtua
7780

7881
More connectors are will get added to ClickPipes, you can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).
7982

80-
## Supported data formats
83+
## Supported Data Formats
8184

8285
The supported formats are:
8386
- [JSON](../../../interfaces/formats.md/#json)
8487
- [CSV](../../../interfaces/formats.md/#csv)
8588
- [Parquet](../../../interfaces/formats.md/#parquet)
8689

87-
## Scaling
90+
## Exactly-Once Semantics
8891

89-
Object Storage ClickPipes are scaled based on the minimum ClickHouse service size determined by the [configured vertical autoscaling settings](/docs/en/manage/scaling#configuring-vertical-auto-scaling). The size of the ClickPipe is determined when the pipe is created. Subsequent changes to the ClickHouse service settings will not affect the ClickPipe size.
92+
Various types of failures can occur when ingesting large dataset, which can result in a partial inserts or duplicate data. Object Storage ClickPipes are resilient to insert failures and provides exactly-once semantics. This is accomplished by using temporary "staging" tables. Data is first inserted into the staging tables. If something goes wrong with this insert, the staging table can be truncated and the insert can be retried from a clean state. Only when an insert is completed and successful, the partitions in the staging table are moved to target table. To read more about this strategy, check-out [this blog post](https://clickhouse.com/blog/supercharge-your-clickhouse-data-loads-part3).
9093

91-
To increase the throughput on large ingest jobs, we recommend scaling the ClickHouse service before creating the ClickPipe.
94+
### View Support
95+
Materialized views on the target table are also supported. ClickPipes will create staging tables not only for the target table, but also any dependent materialized view.
9296

93-
## Materialized Views
97+
We do not create staging tables for non-materialized views. This means that if you have a target table with one of more downstream materialized views, those materialized views should avoid selecting data via a view from the target table. Otherwise, you may find that you are missing data in the materialized view.
9498

95-
Object Storage ClickPipes with materialized views require `Full access` permissions to be selected when created. If this is not possible, ensure that the role used by the pipe can create tables and materialized views in the destination database.
99+
## Scaling
100+
101+
Object Storage ClickPipes are scaled based on the minimum ClickHouse service size determined by the [configured vertical autoscaling settings](/docs/en/manage/scaling#configuring-vertical-auto-scaling). The size of the ClickPipe is determined when the pipe is created. Subsequent changes to the ClickHouse service settings will not affect the ClickPipe size.
96102

97-
Materialized views created while an Object Storage ClickPipe is running will not be populated. Stopping and restarting the pipe will cause the pipe to pick up the materialized views and start populating them. See [Limitations](#limitations) below.
103+
To increase the throughput on large ingest jobs, we recommend scaling the ClickHouse service before creating the ClickPipe.
98104

99105
## Limitations
100106
- Any changes to the destination table, its materialized views (including cascading materialized views), or the materialized view's target tables won't be picked up automatically by the pipe and can result in errors. You must stop the pipe, make the necessary modifications, and then restart the pipe for the changes to be picked up and avoid errors and duplicate data due to retries.
107+
- There are limitations on the types of views that are supported. Please read the section on [exactly-once semantics](#exactly-once-semantics) and [view support](#view-support) for more information.
101108
- Role authentication is not available for S3 ClickPipes for ClickHouse Cloud instances deployed into GCP or Azure. It is only supported for AWS ClickHouse Cloud instances.
102109
- ClickPipes will only attempt to ingest objects at 10GB or smaller in size. If a file is greater than 10GB an error will be appended to the ClickPipes dedicated error table.
103110
- S3 / GCS ClickPipes **does not** share a listing syntax with the [S3 Table Function](https://clickhouse.com/docs/en/sql-reference/table-functions/file#globs_in_path).

0 commit comments

Comments
 (0)