Skip to content

Commit e157c00

Browse files
authored
Merge pull request #4715 from ClickHouse/docdb
Add setup instruction for DocumentDB support
2 parents 910b0d7 + 77bfa5b commit e157c00

File tree

11 files changed

+120
-7
lines changed

11 files changed

+120
-7
lines changed

docs/integrations/data-ingestion/clickpipes/aws-privatelink.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,21 +33,25 @@ data source types:
3333
- Kafka
3434
- Postgres
3535
- MySQL
36+
- MongoDB
3637

3738
## Supported AWS PrivateLink endpoint types {#aws-privatelink-endpoint-types}
3839

3940
ClickPipes reverse private endpoint can be configured with one of the following AWS PrivateLink approaches:
4041

41-
- [VPC resource](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-resources.html)
42-
- [MSK multi-VPC connectivity for MSK ClickPipe](https://docs.aws.amazon.com/msk/latest/developerguide/aws-access-mult-vpc.html)
43-
- [VPC endpoint service](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-share-your-services.html)
42+
- [VPC resource](#vpc-resource)
43+
- [MSK multi-VPC connectivity for MSK ClickPipe](#msk-multi-vpc)
44+
- [VPC endpoint service](#vpc-endpoint-service)
4445

4546
### VPC resource {#vpc-resource}
4647

48+
:::info
49+
Cross-region is not supported.
50+
:::
51+
4752
Your VPC resources can be accessed in ClickPipes using [PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-resources.html). This approach doesn't require setting up a load balancer in front of your data source.
4853

4954
Resource configuration can be targeted with a specific host or RDS cluster ARN.
50-
Cross-region is not supported.
5155

5256
It's the preferred choice for Postgres CDC ingesting data from an RDS cluster.
5357

@@ -171,8 +175,7 @@ Follow our [MSK setup guide for ClickPipes](/knowledgebase/aws-privatelink-setup
171175
It requires setting up a NLB (Network Load Balancer) in front of your data source
172176
and configuring the VPC endpoint service to use the NLB.
173177

174-
VPC endpoint service can be [configured with a private DNS](https://docs.aws.amazon.com/vpc/latest/privatelink/manage-dns-names.html),
175-
that will be accessible in a ClickPipes VPC.
178+
VPC endpoint service can be [configured with a private DNS](https://docs.aws.amazon.com/vpc/latest/privatelink/manage-dns-names.html), that will be accessible in a ClickPipes VPC.
176179

177180
It's a preferred choice for:
178181

docs/integrations/data-ingestion/clickpipes/mongodb/faq.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ For more examples, see our [Working with JSON guide](./quickstart).
5252

5353
### How do I handle `resume of change stream was not possible, as the resume point may no longer be in the oplog.` error? {#resume-point-may-no-longer-be-in-the-oplog-error}
5454

55-
This error typically occurs when the oplog is truncated and ClickPipe is unable to resume the change stream at the expected point. To resolve this issue, [resync the ClickPipe](./resync.md). To avoid this issue from recurring, we recommend [increasing the oplog retention period](./source/atlas#enable-oplog-retention) (or [here](./source/generic#enable-oplog-retention) if you are on a self-managed MongoDB).
55+
This error typically occurs when the oplog is truncated and ClickPipe is unable to resume the change stream at the expected point. To resolve this issue, [resync the ClickPipe](./resync.md). To avoid this issue from recurring, we recommend increasing the oplog retention period. See instructions for [MongoDB Atlas](./source/atlas#enable-oplog-retention), [self-managed MongoDB](./source/generic#enable-oplog-retention), or [Amazon DocumentDB](./source/documentdb#configure-change-stream-log-retention).
5656

5757
### How is replication managed? {#how-is-replication-managed}
5858

@@ -63,4 +63,21 @@ We use MongoDB's native Change Streams API to track changes in the database. Cha
6363
Which read preference to use depends on your specific use case. If you want to minimize the load on your primary node, we recommend using `secondaryPreferred` read preference. If you want to optimize ingestion latency, we recommend using `primaryPreferred` read preference. For more details, see [MongoDB documentation](https://www.mongodb.com/docs/manual/core/read-preference/#read-preference-modes-1).
6464

6565
### Does the MongoDB ClickPipe support Sharded Cluster? {#does-the-mongodb-clickpipe-support-sharded-cluster}
66+
6667
Yes, the MongoDB ClickPipe supports both Replica Set and Sharded Cluster.
68+
69+
### Does MongoDB ClickPipe support Amazon DocumentDB? {#documentdb-support}
70+
71+
Yes, MongoDB ClickPipe supports Amazon DocumentDB 5.0. See [Amazon DocumentDB source setup guide](./source/documentdb.md) for details.
72+
73+
### Does MongoDB ClickPipe support PrivateLink? {#privatelink-support}
74+
75+
We support PrivateLink for MongoDB (and DocumentDB) cluster in AWS only.
76+
77+
Note that unlike single-node relational database, MongoDB client requires successful replica set discovery to be able to respect the configured `ReadPreference`. This requires setting up PrivateLink with all the nodes in the cluster so the MongoDB client can successfully establish replica set connection, as well as redirect to another node when the connected node goes down.
78+
79+
If you prefer to connect to a single node in your cluster, you can skip replica set discovery by specifying `/?directConnection=true` in the connection string during ClickPipes setup. The PrivateLink setup in this case will be similar to a single-node relational database, and is the simplest option for PrivateLink support.
80+
81+
For replica set connection, you can set up PrivateLink for MongoDB with either VPC Resource or VPC Endpoint Service. If you go with VPC Resource, you would need to create a `GROUP` resource configuration, plus a `CHILD` resource configuration for each node in the cluster. If you go with VPC Endpoint Service, you would need to create a separate Endpoint Service (and a separate NLB) for each node in the cluster.
82+
83+
See [AWS PrivateLink for ClickPipes](../aws-privatelink.md) documentation for more details. Please reach out to ClickHouse support for assistance.

docs/integrations/data-ingestion/clickpipes/mongodb/index.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import mongodb_connection_details from '@site/static/images/integrations/data-in
1515
import select_destination_db from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/select-destination-db.png'
1616
import ch_permissions from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ch-permissions.jpg'
1717
import Image from '@theme/IdealImage';
18+
import ssh_tunnel from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ssh-tunnel.jpg'
1819

1920
# Ingesting data from MongoDB to ClickHouse (using CDC)
2021

@@ -38,6 +39,8 @@ To get started, you first need to ensure that your MongoDB database is correctly
3839

3940
2. [Generic MongoDB](./mongodb/source/generic)
4041

42+
3. [Amazon DocumentDB](./mongodb/source/documentdb)
43+
4144
Once your source MongoDB database is set up, you can continue creating your ClickPipe.
4245

4346
## Create your ClickPipe {#create-your-clickpipe}
@@ -67,6 +70,22 @@ Make sure you are logged in to your ClickHouse Cloud account. If you don't have
6770

6871
<Image img={mongodb_connection_details} alt="Fill in connection details" size="lg" border/>
6972

73+
#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling}
74+
75+
You can specify SSH tunneling details if your source MongoDB database is not publicly accessible.
76+
77+
1. Enable the "Use SSH Tunnelling" toggle.
78+
2. Fill in the SSH connection details.
79+
80+
<Image img={ssh_tunnel} alt="SSH tunneling" size="lg" border/>
81+
82+
3. To use Key-based authentication, click on "Revoke and generate key pair" to generate a new key pair and copy the generated public key to your SSH server under `~/.ssh/authorized_keys`.
83+
4. Click on "Verify Connection" to verify the connection.
84+
85+
:::note
86+
Make sure to whitelist [ClickPipes IP addresses](../clickpipes#list-of-static-ips) in your firewall rules for the SSH bastion host so that ClickPipes can establish the SSH tunnel.
87+
:::
88+
7089
Once the connection details are filled in, click `Next`.
7190

7291
#### Configure advanced settings {#advanced-settings}
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
sidebar_label: 'Amazon DocumentDB'
3+
description: 'Step-by-step guide on how to set up Amazon DocumentDB as a source for ClickPipes'
4+
slug: /integrations/clickpipes/mongodb/source/documentdb
5+
title: 'Amazon DocumentDB source setup guide'
6+
doc_type: 'guide'
7+
keywords: ['clickpipes', 'mongodb', 'documentdb', 'cdc', 'data ingestion', 'real-time sync']
8+
---
9+
10+
import docdb_select_parameter_group from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-select-parameter-group.png'
11+
import docdb_modify_parameter_group from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-modify-parameter-group.png'
12+
import docdb_apply_parameter_group from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-apply-parameter-group.png'
13+
import docdb_parameter_group_status from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/docdb-parameter-group-status.png'
14+
import Image from '@theme/IdealImage';
15+
16+
# Amazon DocumentDB source setup guide
17+
18+
## Supported DocumentDB versions {#supported-documentdb-versions}
19+
20+
ClickPipes supports DocumentDB version 5.0.
21+
22+
## Configure change stream log retention {#configure-change-stream-log-retention}
23+
24+
By default, Amazon DocumentDB has a 3-hour change stream log retention period, while initial load may take much longer depending on existing data volume in your DocumentDB. We recommend setting the change stream log retention to 72 hours or longer to ensure that it is not truncated before the initial snapshot is completed.
25+
26+
### Update change stream log retention via AWS Console {#update-change-stream-log-retention-via-aws-console}
27+
28+
1. Click `Parameter groups` in the left panel, find the parameter group used by your DocumentDB cluster (if you are using the default parameter group, you will need to create a new parameter group first in order to modify it).
29+
<Image img={docdb_select_parameter_group} alt="Select parameter group" size="lg" border/>
30+
31+
2. Search for `change_stream_log_retention_duration`, select and edit it to `259200` (72 hours)
32+
<Image img={docdb_modify_parameter_group} alt="Modify parameter group" size="lg" border/>
33+
34+
3. Click `Apply Changes` to apply the modified parameter group to your DocumentDB cluster immediately. You should see the status of the parameter group transition to `applying`, and then to `in-sync` when the change is applied.
35+
<Image img={docdb_apply_parameter_group} alt="Apply parameter group" size="lg" border/>
36+
37+
<Image img={docdb_parameter_group_status} alt="Parameter group status" size="lg" border/>
38+
39+
### Update change stream log retention via AWS CLI {#update-change-stream-log-retention-via-aws-cli}
40+
41+
Alternatively, you can configure this via AWS CLI.
42+
43+
To check the current change stream log retention period:
44+
```shell
45+
aws docdb describe-db-cluster-parameters --db-cluster-parameter-group-name <PARAMETER_GROUP_NAME> --query "Parameters[?ParameterName=='change_stream_log_retention_duration'].{Name:ParameterName,Value:ParameterValue}"
46+
```
47+
48+
To set the change stream log retention period to 72 hours:
49+
```shell
50+
aws docdb modify-db-cluster-parameter-group --db-cluster-parameter-group-name <PARAMETER_GROUP_NAME> --parameters "ParameterName=change_stream_log_retention_duration,ParameterValue=259200,ApplyMethod=immediate"
51+
```
52+
53+
## Configure a database user {#configure-database-user}
54+
55+
Connect to your DocumentDB cluster as an admin user and execute the following command to create a database user for MongoDB CDC ClickPipes:
56+
57+
```javascript
58+
db.getSiblingDB("admin").createUser({
59+
user: "clickpipes_user",
60+
pwd: "some_secure_password",
61+
roles: ["readAnyDatabase", "clusterMonitor"],
62+
})
63+
```
64+
65+
:::note
66+
Make sure to replace `clickpipes_user` and `some_secure_password` with your desired username and password.
67+
:::
68+
69+
## What's next? {#whats-next}
70+
71+
You can now [create your ClickPipe](../index.md) and start ingesting data from your DocumentDB instance into ClickHouse Cloud.
72+
Make sure to note down the connection details you used while setting up your DocumentDB cluster as you will need them during the ClickPipe creation process.

scripts/aspell-ignore/en/aspell-dict.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -367,6 +367,7 @@ DistributedFilesToInsert
367367
DistributedProductMode
368368
DistributedSend
369369
DockerHub
370+
DocumentDB
370371
Doron
371372
DoubleDelta
372373
Doxygen

sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -696,6 +696,7 @@ const sidebars = {
696696
items: [
697697
"integrations/data-ingestion/clickpipes/mongodb/source/atlas",
698698
"integrations/data-ingestion/clickpipes/mongodb/source/generic",
699+
"integrations/data-ingestion/clickpipes/mongodb/source/documentdb",
699700
],
700701
},
701702
],
377 KB
Loading
308 KB
Loading
166 KB
Loading
214 KB
Loading

0 commit comments

Comments
 (0)