Skip to content

Commit 41a4ab7

Browse files
committed
mongo: add documentation for mongodb cdc connector
1 parent 9ace3eb commit 41a4ab7

15 files changed

+267
-0
lines changed

docs/integrations/data-ingestion/clickpipes/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import DOsvg from '@site/static/images/integrations/logos/digitalocean.svg';
1717
import ABSsvg from '@site/static/images/integrations/logos/azureblobstorage.svg';
1818
import Postgressvg from '@site/static/images/integrations/logos/postgresql.svg';
1919
import Mysqlsvg from '@site/static/images/integrations/logos/mysql.svg';
20+
import Mongodbsvg from '@site/static/images/integrations/logos/mongodb.svg';
2021
import redpanda_logo from '@site/static/images/integrations/logos/logo_redpanda.png';
2122
import clickpipes_stack from '@site/static/images/integrations/data-ingestion/clickpipes/clickpipes_stack.png';
2223
import cp_custom_role from '@site/static/images/integrations/data-ingestion/clickpipes/cp_custom_role.png';
@@ -47,6 +48,7 @@ import Image from '@theme/IdealImage';
4748
| [Amazon Kinesis](/integrations/clickpipes/kinesis) | <Amazonkinesis class="image" alt="Amazon Kenesis logo" style={{width: '3rem', height: 'auto'}}/> |Streaming| Stable | Configure ClickPipes and start ingesting streaming data from Amazon Kinesis into ClickHouse cloud. |
4849
| [Postgres](/integrations/clickpipes/postgres) | <Postgressvg class="image" alt="Postgres logo" style={{width: '3rem', height: 'auto'}}/> |DBMS| Stable | Configure ClickPipes and start ingesting data from Postgres into ClickHouse Cloud. |
4950
| [MySQL](/integrations/clickpipes/mysql) | <Mysqlsvg class="image" alt="MySQL logo" style={{width: '3rem', height: 'auto'}}/> |DBMS| Private Beta | Configure ClickPipes and start ingesting data from MySQL into ClickHouse Cloud. |
51+
| [MongoDB](/integrations/clickpipes/mongodb) | <Mongodbsvg class="image" alt="MongoDB logo" style={{width: '3rem', height: 'auto'}}/> |DBMS| Private Preview | Configure ClickPipes and start ingesting data from MongoDB into ClickHouse Cloud. |
5052

5153
More connectors will get added to ClickPipes, you can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).
5254

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: 'ClickPipes for MongoDB: Supported data types'
3+
slug: /integrations/clickpipes/mongodb/datatypes
4+
description: 'Page describing MongoDB ClickPipe datatype mapping from MongoDB to ClickHouse'
5+
---
6+
7+
MongoDB stores data records as BSON documents. In ClickPipes, you can configure to ingest BSON documents to ClickHouse as either JSON or JSON String. The following table shows the supported BSON to JSON type mapping:
8+
9+
| MongoDB BSON Type | ClickHouse JSON Type | Notes |
10+
| ------------------------ | -------------------------------------- | ------------------------ |
11+
| ObjectId | String | |
12+
| String | String | |
13+
| 32-bit integer | Int64 | |
14+
| 64-bit integer | Int64 | |
15+
| Double | Float64 | |
16+
| Boolean | Bool | |
17+
| Date | String | ISO 8601 format |
18+
| Regular Expression | \{Options: String, Pattern: String\} | MongoDB regex with fixed fields: Options (regex flags) and Pattern (regex pattern) |
19+
| Timestamp | \{T: Int64, I: Int64\} | MongoDB internal timestamp format with fixed fields: T (timestamp) and I (increment) |
20+
| Decimal128 | String | |
21+
| Array | Array(Nullable(String)) | |
22+
| Binary data | Array(Nullable(Int64)) | Array of byte values |
23+
| JavaScript | String | |
24+
| Null | Null | |
25+
| Object | Dynamic | Each nested field is mapped recursively |
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
sidebar_label: 'Ingesting Data from MongoDB to ClickHouse'
3+
description: 'Describes how to seamlessly connect your MongoDB to ClickHouse Cloud.'
4+
slug: /integrations/clickpipes/mongodb
5+
title: 'Ingesting data from MongoDB to ClickHouse (using CDC)'
6+
---
7+
8+
import BetaBadge from '@theme/badges/BetaBadge';
9+
import cp_service from '@site/static/images/integrations/data-ingestion/clickpipes/cp_service.png';
10+
import cp_step0 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step0.png';
11+
import mongodb_tile from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongodb-tile.png'
12+
import mongodb_connection_details from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongodb-connection-details.png'
13+
import ssh_tunnel from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ssh-tunnel.jpg'
14+
import select_destination_db from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/select-destination-db.png'
15+
import ch_permissions from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ch-permissions.jpg'
16+
import Image from '@theme/IdealImage';
17+
18+
# Ingesting data from MongoDB to ClickHouse (using CDC)
19+
20+
<BetaBadge/>
21+
22+
:::info
23+
Currently, ingesting data from MongoDB to ClickHouse Cloud via ClickPipes is in Private Preview.
24+
:::
25+
26+
You can use ClickPipes to ingest data from your MongoDB database into ClickHouse Cloud. The source MongoDB database can be hosted on-premises or in the cloud using services like Mongo Atlas.
27+
28+
## Prerequisites {#prerequisites}
29+
30+
To get started, you first need to ensure that your MongoDB database is correctly configured for replication. The configuration steps depend on how you're deploying MongoDB, so please follow the relevant guide below:
31+
32+
1. [Mongo Atlas](./mongodb/source/atlas)
33+
34+
2. [Generic MongoDB](./mongodb/source/generic)
35+
36+
Once your source MongoDB database is set up, you can continue creating your ClickPipe.
37+
38+
## Create your ClickPipe {#create-your-clickpipe}
39+
40+
Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).
41+
42+
[//]: # ( TODO update image here)
43+
1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service.
44+
45+
<Image img={cp_service} alt="ClickPipes service" size="lg" border/>
46+
47+
2. Select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe".
48+
49+
<Image img={cp_step0} alt="Select imports" size="lg" border/>
50+
51+
3. Select the `MongoDB CDC` tile.
52+
53+
<Image img={mongodb_tile} alt="Select MongoDB" size="lg" border/>
54+
55+
### Add your source MongoDB database connection {#add-your-source-mongodb-database-connection}
56+
57+
4. Fill in the connection details for your source MongoDB database which you configured in the prerequisites step.
58+
59+
:::info
60+
Before you start adding your connection details make sure that you have whitelisted ClickPipes IP addresses in your firewall rules. On the following page you can find a [list of ClickPipes IP addresses](../index.md#list-of-static-ips).
61+
For more information refer to the source MongoDB setup guides linked at [the top of this page](#prerequisites).
62+
:::
63+
64+
<Image img={mongodb_connection_details} alt="Fill in connection details" size="lg" border/>
65+
66+
#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling}
67+
68+
You can specify SSH tunneling details if your source MongoDB database is not publicly accessible.
69+
70+
1. Enable the "Use SSH Tunnelling" toggle.
71+
2. Fill in the SSH connection details.
72+
73+
<Image img={ssh_tunnel} alt="SSH tunneling" size="lg" border/>
74+
75+
3. To use Key-based authentication, click on "Revoke and generate key pair" to generate a new key pair and copy the generated public key to your SSH server under `~/.ssh/authorized_keys`.
76+
4. Click on "Verify Connection" to verify the connection.
77+
78+
:::note
79+
Make sure to whitelist [ClickPipes IP addresses](../clickpipes#list-of-static-ips) in your firewall rules for the SSH bastion host so that ClickPipes can establish the SSH tunnel.
80+
:::
81+
82+
Once the connection details are filled in, click `Next`.
83+
84+
#### Configure advanced settings {#advanced-settings}
85+
86+
You can configure the advanced settings if needed. A brief description of each setting is provided below:
87+
88+
- **Sync interval**: This is the interval at which ClickPipes will poll the source database for changes. This has an implication on the destination ClickHouse service, for cost-sensitive users we recommend to keep this at a higher value (over `3600`).
89+
- **Pull batch size**: The number of rows to fetch in a single batch. This is a best effort setting and may not be respected in all cases.
90+
- **Snapshot number of tables in parallel**: This is the number of tables that will be fetched in parallel during the initial snapshot. This is useful when you have a large number of tables and you want to control the number of tables fetched in parallel.
91+
92+
### Configure the tables {#configure-the-tables}
93+
94+
5. Here you can select the destination database for your ClickPipe. You can either select an existing database or create a new one.
95+
96+
<Image img={select_destination_db} alt="Select destination database" size="lg" border/>
97+
98+
6. You can select the tables you want to replicate from the source MongoDB database. While selecting the tables, you can also choose to rename the tables in the destination ClickHouse database.
99+
100+
### Review permissions and start the ClickPipe {#review-permissions-and-start-the-clickpipe}
101+
102+
7. Select the "Full access" role from the permissions dropdown and click "Complete Setup".
103+
104+
<Image img={ch_permissions} alt="Review permissions" size="lg" border/>
105+
106+
## What's next? {#whats-next}
107+
108+
Once you've set up your ClickPipe to replicate data from MongoDB to ClickHouse Cloud, you can focus on how to query and model your data for optimal performance.
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
sidebar_label: 'Mongo Atlas'
3+
description: 'Step-by-step guide on how to set up Mongo Atlas as a source for ClickPipes'
4+
slug: /integrations/clickpipes/mongodb/source/atlas
5+
title: 'Mongo Atlas source setup guide'
6+
---
7+
8+
import mongo_atlas_configuration from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongo-atlas-cluster-overview-configuration.png'
9+
import mngo_atlas_additional_settings from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongo-atlas-expand-additional-settings.png'
10+
import mongo_atlas_retention_hours from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongo-atlas-set-retention-hours.png'
11+
import mongo_atlas_add_user from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongo-atlas-add-new-database-user.png'
12+
import mongo_atlas_add_roles from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongo-atlas-database-user-privilege.png'
13+
import mongo_atlas_restrict_access from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongo-atlas-restrict-access.png'
14+
import Image from '@theme/IdealImage';
15+
16+
# Mongo Atlas source setup guide
17+
18+
## Configure oplog retention {#enable-oplog-retention}
19+
20+
Minimum oplog retention of 24 hours is required for replication. We recommend setting the oplog retention to 72 hours or longer to ensure that the oplog is not truncated before the initial snapshot is completed. To set the oplog retention via UI:
21+
22+
1. Navigate to your cluster's `Overview` tab in the Atlas console and click on the `Configuration` tab.
23+
<Image img={mongo_atlas_configuration} alt="Navigate to cluster configuration" size="lg" border/>
24+
25+
2. Click `Additional Settings` and scroll down to `More Configuration Options`.
26+
<Image img={mngo_atlas_additional_settings} alt="Expand additional settings" size="lg" border/>
27+
28+
3. Click `More Configuration Options` and set the minimum oplog window to `72 hours` or longer.
29+
<Image img={mongo_atlas_retention_hours} alt="Set oplog retention hours" size="lg" border/>
30+
31+
4. Click `Review Changes` to review, and then `Apply Changes` to deploy the changes.
32+
33+
## Configure a database user {#configure-database-user}
34+
35+
Once you are logged in to your Atlas console, click `Database Access` under the Security tab in the left navigation bar. Click on "Add New Database User".
36+
37+
ClickPipes requires password authentication:
38+
39+
<Image img={mongo_atlas_add_user} alt="Add database user" size="lg" border/>
40+
41+
ClickPipes requires a user with the following roles:
42+
43+
- `readAnyDatabase`
44+
- `clusterMonitor`
45+
46+
You can find them in the `Specific Priviledges` section:
47+
48+
<Image img={mongo_atlas_add_roles} alt="Configure user roles" size="lg" border/>
49+
50+
You can further specify the cluster(s)/instance(s) you wish to grant access to ClickPipes user:
51+
52+
<Image img={mongo_atlas_restrict_access} alt="Restrict cluster/instance acces" size="lg" border/>
53+
54+
## What's next? {#whats-next}
55+
56+
You can now [create your ClickPipe](../index.md) and start ingesting data from your MongoDB instance into ClickHouse Cloud.
57+
Make sure to note down the connection details you used while setting up your MongoDB instance as you will need them during the ClickPipe creation process.
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
sidebar_label: 'Generic MongoDB'
3+
description: 'Set up any MongoDB instance as a source for ClickPipes'
4+
slug: /integrations/clickpipes/mongodb/source/generic
5+
title: 'Generic MongoDB source setup guide'
6+
---
7+
8+
# Generic MongoDB source setup guide
9+
10+
:::info
11+
12+
If you use MongoDB Atlas, please refer to the specific guide [here](./atlas).
13+
14+
:::
15+
16+
## Enable oplog retention {#enable-oplog-retention}
17+
18+
Minimum oplog retention of 24 hours is required for replication. We recommend setting the oplog retention to 72 hours or longer to ensure that the oplog is not truncated before the initial snapshot is completed.
19+
20+
You can check your current oplog retention by running the following command in the MongoDB shell (you must have `clusterMonitor` role to run this command):
21+
22+
```javascript
23+
db.serverStatus().oplogTruncation.oplogMinRetentionHours
24+
```
25+
26+
To set the oplog retention to 72 hours, run the following command as an admin user:
27+
28+
```javascript
29+
db.adminCommand({
30+
"replSetResizeOplog" : 1,
31+
"minRetentionHours": 72
32+
})
33+
```
34+
35+
## Configure a database user {#configure-database-user}
36+
37+
Connect to your MongoDB instance as an admin user and execute the following command to create a user for MongoDB CDC ClickPipes:
38+
39+
```javascript
40+
use admin;
41+
db.createUser({
42+
user: "clickpipes_user",
43+
pwd: "some_secure_password",
44+
roles: ["readAnyDatabase", "clusterMonitor"],
45+
})
46+
```
47+
48+
:::note
49+
50+
Make sure to replace `clickpipes_user` and `some_secure_password` with your desired username and password.
51+
52+
:::
53+
54+
## What's next? {#whats-next}
55+
56+
You can now [create your ClickPipe](../index.md) and start ingesting data from your MongoDB instance into ClickHouse Cloud.
57+
Make sure to note down the connection details you used while setting up your MongoDB instance as you will need them during the ClickPipe creation process.

sidebars.js

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -801,6 +801,24 @@ const sidebars = {
801801
"integrations/data-ingestion/clickpipes/mysql/datatypes"
802802
],
803803
},
804+
{
805+
type: "category",
806+
label: "ClickPipes for MongoDB",
807+
collapsed: true,
808+
collapsible: true,
809+
items: [
810+
"integrations/data-ingestion/clickpipes/mongodb/index",
811+
"integrations/data-ingestion/clickpipes/mongodb/datatypes",
812+
{
813+
type: "category",
814+
label: "Source",
815+
items: [
816+
"integrations/data-ingestion/clickpipes/mongodb/source/atlas",
817+
"integrations/data-ingestion/clickpipes/mongodb/source/generic",
818+
],
819+
},
820+
],
821+
}
804822
],
805823
},
806824
{
212 KB
Loading
305 KB
Loading
206 KB
Loading
207 KB
Loading

0 commit comments

Comments
 (0)