Skip to content

Commit 35948d4

Browse files
committed
Merge branch 'main' of https://github.com/ClickHouse/clickhouse-docs into value-fixes-1
2 parents dc1c8e6 + 80c32f7 commit 35948d4

File tree

22 files changed

+231
-22
lines changed

22 files changed

+231
-22
lines changed

.vale.ini

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,17 @@
11
StylesPath = styles
2-
MinAlertLevel = warning
32

4-
[*.{md}]
3+
# Exclude with more specific patterns
4+
# [!docs/engines/**/*.md]
5+
# [!docs/development/**/*.md]
6+
# [!docs/interfaces/**/*.md]
7+
# [!docs/operations/**/*.md]
8+
# [!docs/sql-reference/**/*.md]
9+
[!docs/whats-new/**/*.md]
10+
[!docs/releases/**/*.md]
11+
12+
# Only check .md files (not .mdx)
13+
[docs/**/*.md]
514
BasedOnStyles = ClickHouse
15+
16+
ClickHouse.SentenceLength = NO
17+
ClickHouse.FutureTense = NO

docs/cloud/reference/05_supported-regions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ import EnterprisePlanFeatureBadge from '@theme/badges/EnterprisePlanFeatureBadge
2525
- us-east-1 (N. Virginia)
2626
- us-east-2 (Ohio)
2727
- us-west-2 (Oregon)
28+
- il-central-1 (Israel, Tel Aviv)
2829

2930
**Private Region:**
3031
- ca-central-1 (Canada)

docs/integrations/data-ingestion/clickpipes/aws-privatelink.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,21 +33,25 @@ data source types:
3333
- Kafka
3434
- Postgres
3535
- MySQL
36+
- MongoDB
3637

3738
## Supported AWS PrivateLink endpoint types {#aws-privatelink-endpoint-types}
3839

3940
ClickPipes reverse private endpoint can be configured with one of the following AWS PrivateLink approaches:
4041

41-
- [VPC resource](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-resources.html)
42-
- [MSK multi-VPC connectivity for MSK ClickPipe](https://docs.aws.amazon.com/msk/latest/developerguide/aws-access-mult-vpc.html)
43-
- [VPC endpoint service](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html)
42+
- [VPC resource](#vpc-resource)
43+
- [MSK multi-VPC connectivity for MSK ClickPipe](#msk-multi-vpc)
44+
- [VPC endpoint service](#vpc-endpoint-service)
4445

4546
### VPC resource {#vpc-resource}
4647

48+
:::info
49+
Cross-region is not supported.
50+
:::
51+
4752
Your VPC resources can be accessed in ClickPipes using [PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-resources.html). This approach doesn't require setting up a load balancer in front of your data source.
4853

4954
Resource configuration can be targeted with a specific host or RDS cluster ARN.
50-
Cross-region is not supported.
5155

5256
It's the preferred choice for Postgres CDC ingesting data from an RDS cluster.
5357

@@ -171,8 +175,7 @@ Follow our [MSK setup guide for ClickPipes](/knowledgebase/aws-privatelink-setup
171175
It requires setting up a NLB (Network Load Balancer) in front of your data source
172176
and configuring the VPC endpoint service to use the NLB.
173177

174-
VPC endpoint service can be [configured with a private DNS](https://docs.aws.amazon.com/vpc/latest/privatelink/manage-dns-names.html),
175-
that will be accessible in a ClickPipes VPC.
178+
VPC endpoint service can be [configured with a private DNS](https://docs.aws.amazon.com/vpc/latest/privatelink/manage-dns-names.html), that will be accessible in a ClickPipes VPC.
176179

177180
It's a preferred choice for:
178181

docs/integrations/data-ingestion/clickpipes/mongodb/faq.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ For more examples, see our [Working with JSON guide](./quickstart).
5252

5353
### How do I handle `resume of change stream was not possible, as the resume point may no longer be in the oplog.` error? {#resume-point-may-no-longer-be-in-the-oplog-error}
5454

55-
This error typically occurs when the oplog is truncated and ClickPipe is unable to resume the change stream at the expected point. To resolve this issue, [resync the ClickPipe](./resync.md). To avoid this issue from recurring, we recommend [increasing the oplog retention period](./source/atlas#enable-oplog-retention) (or [here](./source/generic#enable-oplog-retention) if you are on a self-managed MongoDB).
55+
This error typically occurs when the oplog is truncated and ClickPipe is unable to resume the change stream at the expected point. To resolve this issue, [resync the ClickPipe](./resync.md). To avoid this issue from recurring, we recommend increasing the oplog retention period. See instructions for [MongoDB Atlas](./source/atlas#enable-oplog-retention), [self-managed MongoDB](./source/generic#enable-oplog-retention), or [Amazon DocumentDB](./source/documentdb#configure-change-stream-log-retention).
5656

5757
### How is replication managed? {#how-is-replication-managed}
5858

@@ -63,4 +63,21 @@ We use MongoDB's native Change Streams API to track changes in the database. Cha
6363
Which read preference to use depends on your specific use case. If you want to minimize the load on your primary node, we recommend using `secondaryPreferred` read preference. If you want to optimize ingestion latency, we recommend using `primaryPreferred` read preference. For more details, see [MongoDB documentation](https://www.mongodb.com/docs/manual/core/read-preference/#read-preference-modes-1).
6464

6565
### Does the MongoDB ClickPipe support Sharded Cluster? {#does-the-mongodb-clickpipe-support-sharded-cluster}
66+
6667
Yes, the MongoDB ClickPipe supports both Replica Set and Sharded Cluster.
68+
69+
### Does MongoDB ClickPipe support Amazon DocumentDB? {#documentdb-support}
70+
71+
Yes, MongoDB ClickPipe supports Amazon DocumentDB 5.0. See [Amazon DocumentDB source setup guide](./source/documentdb.md) for details.
72+
73+
### Does MongoDB ClickPipe support PrivateLink? {#privatelink-support}
74+
75+
We support PrivateLink for MongoDB (and DocumentDB) cluster in AWS only.
76+
77+
Note that unlike single-node relational database, MongoDB client requires successful replica set discovery to be able to respect the configured `ReadPreference`. This requires setting up PrivateLink with all the nodes in the cluster so the MongoDB client can successfully establish replica set connection, as well as redirect to another node when the connected node goes down.
78+
79+
If you prefer to connect to a single node in your cluster, you can skip replica set discovery by specifying `/?directConnection=true` in the connection string during ClickPipes setup. The PrivateLink setup in this case will be similar to a single-node relational database, and is the simplest option for PrivateLink support.
80+
81+
For replica set connection, you can set up PrivateLink for MongoDB with either VPC Resource or VPC Endpoint Service. If you go with VPC Resource, you would need to create a `GROUP` resource configuration, plus a `CHILD` resource configuration for each node in the cluster. If you go with VPC Endpoint Service, you would need to create a separate Endpoint Service (and a separate NLB) for each node in the cluster.
82+
83+
See [AWS PrivateLink for ClickPipes](../aws-privatelink.md) documentation for more details. Please reach out to ClickHouse support for assistance.

docs/integrations/data-ingestion/clickpipes/mongodb/index.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import mongodb_connection_details from '@site/static/images/integrations/data-in
1515
import select_destination_db from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/select-destination-db.png'
1616
import ch_permissions from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ch-permissions.jpg'
1717
import Image from '@theme/IdealImage';
18+
import ssh_tunnel from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ssh-tunnel.jpg'
1819

1920
# Ingesting data from MongoDB to ClickHouse (using CDC)
2021

@@ -38,6 +39,8 @@ To get started, you first need to ensure that your MongoDB database is correctly
3839

3940
2. [Generic MongoDB](./mongodb/source/generic)
4041

42+
3. [Amazon DocumentDB](./mongodb/source/documentdb)
43+
4144
Once your source MongoDB database is set up, you can continue creating your ClickPipe.
4245

4346
## Create your ClickPipe {#create-your-clickpipe}
@@ -67,6 +70,22 @@ Make sure you are logged in to your ClickHouse Cloud account. If you don't have
6770

6871
<Image img={mongodb_connection_details} alt="Fill in connection details" size="lg" border/>
6972

73+
#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling}
74+
75+
You can specify SSH tunneling details if your source MongoDB database is not publicly accessible.
76+
77+
1. Enable the "Use SSH Tunnelling" toggle.
78+
2. Fill in the SSH connection details.
79+
80+
<Image img={ssh_tunnel} alt="SSH tunneling" size="lg" border/>
81+
82+
3. To use Key-based authentication, click on "Revoke and generate key pair" to generate a new key pair and copy the generated public key to your SSH server under `~/.ssh/authorized_keys`.
83+
4. Click on "Verify Connection" to verify the connection.
84+
85+
:::note
86+
Make sure to whitelist [ClickPipes IP addresses](../clickpipes#list-of-static-ips) in your firewall rules for the SSH bastion host so that ClickPipes can establish the SSH tunnel.
87+
:::
88+
7089
Once the connection details are filled in, click `Next`.
7190

7291
#### Configure advanced settings {#advanced-settings}
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
sidebar_label: 'Amazon DocumentDB'
3+
description: 'Step-by-step guide on how to set up Amazon DocumentDB as a source for ClickPipes'
4+
slug: /integrations/clickpipes/mongodb/source/documentdb
5+
title: 'Amazon DocumentDB source setup guide'
6+
doc_type: 'guide'
7+
keywords: ['clickpipes', 'mongodb', 'documentdb', 'cdc', 'data ingestion', 'real-time sync']
8+
---
9+
10+
import docdb_select_parameter_group from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-select-parameter-group.png'
11+
import docdb_modify_parameter_group from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-modify-parameter-group.png'
12+
import docdb_apply_parameter_group from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-apply-parameter-group.png'
13+
import docdb_parameter_group_status from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-parameter-group-status.png'
14+
import Image from '@theme/IdealImage';
15+
16+
# Amazon DocumentDB source setup guide
17+
18+
## Supported DocumentDB versions {#supported-documentdb-versions}
19+
20+
ClickPipes supports DocumentDB version 5.0.
21+
22+
## Configure change stream log retention {#configure-change-stream-log-retention}
23+
24+
By default, Amazon DocumentDB has a 3-hour change stream log retention period, while initial load may take much longer depending on existing data volume in your DocumentDB. We recommend setting the change stream log retention to 72 hours or longer to ensure that it is not truncated before the initial snapshot is completed.
25+
26+
### Update change stream log retention via AWS Console {#update-change-stream-log-retention-via-aws-console}
27+
28+
1. Click `Parameter groups` in the left panel, find the parameter group used by your DocumentDB cluster (if you are using the default parameter group, you will need to create a new parameter group first in order to modify it).
29+
<Image img={docdb_select_parameter_group} alt="Select parameter group" size="lg" border/>
30+
31+
2. Search for `change_stream_log_retention_duration`, select and edit it to `259200` (72 hours)
32+
<Image img={docdb_modify_parameter_group} alt="Modify parameter group" size="lg" border/>
33+
34+
3. Click `Apply Changes` to apply the modified parameter group to your DocumentDB cluster immediately. You should see the status of the parameter group transition to `applying`, and then to `in-sync` when the change is applied.
35+
<Image img={docdb_apply_parameter_group} alt="Apply parameter group" size="lg" border/>
36+
37+
<Image img={docdb_parameter_group_status} alt="Parameter group status" size="lg" border/>
38+
39+
### Update change stream log retention via AWS CLI {#update-change-stream-log-retention-via-aws-cli}
40+
41+
Alternatively, you can configure this via AWS CLI.
42+
43+
To check the current change stream log retention period:
44+
```shell
45+
aws docdb describe-db-cluster-parameters --db-cluster-parameter-group-name <PARAMETER_GROUP_NAME> --query "Parameters[?ParameterName=='change_stream_log_retention_duration'].{Name:ParameterName,Value:ParameterValue}"
46+
```
47+
48+
To set the change stream log retention period to 72 hours:
49+
```shell
50+
aws docdb modify-db-cluster-parameter-group --db-cluster-parameter-group-name <PARAMETER_GROUP_NAME> --parameters "ParameterName=change_stream_log_retention_duration,ParameterValue=259200,ApplyMethod=immediate"
51+
```
52+
53+
## Configure a database user {#configure-database-user}
54+
55+
Connect to your DocumentDB cluster as an admin user and execute the following command to create a database user for MongoDB CDC ClickPipes:
56+
57+
```javascript
58+
db.getSiblingDB("admin").createUser({
59+
user: "clickpipes_user",
60+
pwd: "some_secure_password",
61+
roles: ["readAnyDatabase", "clusterMonitor"],
62+
})
63+
```
64+
65+
:::note
66+
Make sure to replace `clickpipes_user` and `some_secure_password` with your desired username and password.
67+
:::
68+
69+
## What's next? {#whats-next}
70+
71+
You can now [create your ClickPipe](../index.md) and start ingesting data from your DocumentDB instance into ClickHouse Cloud.
72+
Make sure to note down the connection details you used while setting up your DocumentDB cluster as you will need them during the ClickPipe creation process.

knowledgebase/exception-too-many-parts.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ keywords: ['Too many parts']
99
{frontMatter.description}
1010
{/* truncate */}
1111

12-
## DB::Exception: Too many parts (600). Merges are processing significantly slower than inserts {#dbexception-too-many-parts-600-merges-are-processing-significantly-slower-than-inserts}
12+
## DB::Exception: Too many parts (Error: 252). Merges are processing significantly slower than inserts {#dbexception-too-many-parts-252-merges-are-processing-significantly-slower-than-inserts}
1313

1414
You reached the `parts_to_throw_insert` setting on a MergeTree table.
1515

scripts/aspell-ignore/en/aspell-dict.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,7 @@ DistributedFilesToInsert
367367
DistributedProductMode
368368
DistributedSend
369369
DockerHub
370+
DocumentDB
370371
Doron
371372
DoubleDelta
372373
Doxygen

scripts/settings/autogenerate-settings.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,7 @@ if [ -f "$FUNCTION_SQL_FILE" ]; then
249249
"Conditional"
250250
"Distance"
251251
"Dates and Times"
252+
"Dictionary"
252253
"Encoding"
253254
"Financial"
254255
"Null"
@@ -369,6 +370,7 @@ insert_src_files=(
369370
"conditional-functions.md"
370371
"distance-functions.md"
371372
"dates_and_times-functions.md"
373+
"dictionary-functions.md"
372374
"encoding-functions.md"
373375
"financial-functions.md"
374376
"null-functions.md"
@@ -405,6 +407,7 @@ insert_dest_files=(
405407
"docs/sql-reference/functions/conditional-functions.md"
406408
"docs/sql-reference/functions/distance-functions.md"
407409
"docs/sql-reference/functions/date-time-functions.md"
410+
"docs/sql-reference/functions/ext-dict-functions.md"
408411
"docs/sql-reference/functions/encoding-functions.md"
409412
"docs/sql-reference/functions/financial-functions.md"
410413
"docs/sql-reference/functions/functions-for-nulls.md"

scripts/settings/format-settings.sql

Lines changed: 23 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,31 @@ WITH
66
FROM file(cpp_file, LineAsString)
77
WHERE match(line, '^\\s*DECLARE\\(')
88
),
9+
setting_aliases AS
10+
(
11+
SELECT
12+
alias_for,
13+
groupArray(name) AS aliases
14+
FROM system.settings
15+
WHERE alias_for != ''
16+
AND alias_for IN settings_from_cpp
17+
GROUP BY alias_for
18+
),
919
main_content AS
1020
(
11-
SELECT format('## {} {} {} \n\n{}\n\n{}\n\n',
12-
name,
13-
'{#'||name||'}',
14-
multiIf(tier == 'Experimental', '<ExperimentalBadge/>', tier == 'Beta', '<BetaBadge/>', ''),
15-
if(type != '' AND default != '', format('<SettingsInfoBlock type="{}" default_value="{}" />', type, default), ''),
16-
trim(BOTH '\\n' FROM description))
17-
FROM system.settings WHERE name IN settings_from_cpp
18-
ORDER BY name
21+
SELECT format('## {} {} {} \n\n{}{}\n\n{}\n\n',
22+
s.name,
23+
'{#'||s.name||'}',
24+
multiIf(s.tier == 'Experimental', '<ExperimentalBadge/>', s.tier == 'Beta', '<BetaBadge/>', ''),
25+
if(sa.aliases IS NOT NULL AND length(sa.aliases) > 0,
26+
'**Aliases**: ' || arrayStringConcat(arrayMap(x -> '`' || x || '`', sa.aliases), ', ') || '\n\n',
27+
''),
28+
if(s.type != '' AND s.default != '', format('<SettingsInfoBlock type="{}" default_value="{}" />', s.type, s.default), ''),
29+
trim(BOTH '\\n' FROM s.description))
30+
FROM system.settings s
31+
LEFT JOIN setting_aliases sa ON s.name = sa.alias_for
32+
WHERE s.name IN settings_from_cpp
33+
ORDER BY s.name
1934
),
2035
'' ||
2136
'---

0 commit comments

Comments
 (0)