Skip to content

Commit cb225fc

Browse files
authored
Merge pull request #4165 from ClickHouse/type-mapping
mongo: add documentation for private preview
2 parents 4cb9236 + 024c2ef commit cb225fc

19 files changed

+425
-0
lines changed

docs/integrations/data-ingestion/clickpipes/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import DOsvg from '@site/static/images/integrations/logos/digitalocean.svg';
1717
import ABSsvg from '@site/static/images/integrations/logos/azureblobstorage.svg';
1818
import Postgressvg from '@site/static/images/integrations/logos/postgresql.svg';
1919
import Mysqlsvg from '@site/static/images/integrations/logos/mysql.svg';
20+
import Mongodbsvg from '@site/static/images/integrations/logos/mongodb.svg';
2021
import redpanda_logo from '@site/static/images/integrations/logos/logo_redpanda.png';
2122
import clickpipes_stack from '@site/static/images/integrations/data-ingestion/clickpipes/clickpipes_stack.png';
2223
import cp_custom_role from '@site/static/images/integrations/data-ingestion/clickpipes/cp_custom_role.png';
@@ -47,6 +48,7 @@ import Image from '@theme/IdealImage';
4748
| [Amazon Kinesis](/integrations/clickpipes/kinesis) | <Amazonkinesis class="image" alt="Amazon Kenesis logo" style={{width: '3rem', height: 'auto'}}/> |Streaming| Stable | Configure ClickPipes and start ingesting streaming data from Amazon Kinesis into ClickHouse cloud. |
4849
| [Postgres](/integrations/clickpipes/postgres) | <Postgressvg class="image" alt="Postgres logo" style={{width: '3rem', height: 'auto'}}/> |DBMS| Stable | Configure ClickPipes and start ingesting data from Postgres into ClickHouse Cloud. |
4950
| [MySQL](/integrations/clickpipes/mysql) | <Mysqlsvg class="image" alt="MySQL logo" style={{width: '3rem', height: 'auto'}}/> |DBMS| Private Beta | Configure ClickPipes and start ingesting data from MySQL into ClickHouse Cloud. |
51+
| [MongoDB](/integrations/clickpipes/mongodb) | <Mongodbsvg class="image" alt="MongoDB logo" style={{width: '3rem', height: 'auto'}}/> |DBMS| Private Preview | Configure ClickPipes and start ingesting data from MongoDB into ClickHouse Cloud. |
5052

5153
More connectors will get added to ClickPipes, you can find out more by [contacting us](https://clickhouse.com/company/contact?loc=clickpipes).
5254

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
title: 'Adding specific tables to a ClickPipe'
3+
description: 'Describes the steps needed to add specific tables to a ClickPipe.'
4+
sidebar_label: 'Add Table'
5+
slug: /integrations/clickpipes/mongodb/add_table
6+
show_title: false
7+
---
8+
9+
import Image from '@theme/IdealImage';
10+
import add_table from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/add_table.png'
11+
12+
# Adding specific tables to a ClickPipe
13+
14+
There are scenarios where it would be useful to add specific tables to a pipe. This becomes a common necessity as your transactional or analytical workload scales.
15+
16+
## Steps to add specific tables to a ClickPipe {#add-tables-steps}
17+
18+
This can be done by the following steps:
19+
1. [Pause](./pause_and_resume.md) the pipe.
20+
2. Click on Edit Table settings.
21+
3. Locate your table - this can be done by searching it in the search bar.
22+
4. Select the table by clicking on the checkbox.
23+
<br/>
24+
<Image img={add_table} border size="md"/>
25+
26+
5. Click update.
27+
6. Upon successful update, the pipe will have statuses `Setup`, `Snapshot` and `Running` in that order. The table's initial load can be tracked in the **Tables** tab.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: 'ClickPipes for MongoDB: Supported data types'
3+
slug: /integrations/clickpipes/mongodb/datatypes
4+
description: 'Page describing MongoDB ClickPipe datatype mapping from MongoDB to ClickHouse'
5+
---
6+
7+
MongoDB stores data records as BSON documents. In ClickPipes, you can configure to ingest BSON documents to ClickHouse as either JSON or JSON String. The following table shows the supported BSON to JSON type mapping:
8+
9+
| MongoDB BSON Type | ClickHouse JSON Type | Notes |
10+
| ------------------------ | -------------------------------------- | ------------------------ |
11+
| ObjectId | String | |
12+
| String | String | |
13+
| 32-bit integer | Int64 | |
14+
| 64-bit integer | Int64 | |
15+
| Double | Float64 | |
16+
| Boolean | Bool | |
17+
| Date | String | ISO 8601 format |
18+
| Regular Expression | \{Options: String, Pattern: String\} | MongoDB regex with fixed fields: Options (regex flags) and Pattern (regex pattern) |
19+
| Timestamp | \{T: Int64, I: Int64\} | MongoDB internal timestamp format with fixed fields: T (timestamp) and I (increment) |
20+
| Decimal128 | String | |
21+
| Array | Array(Nullable(String)) | |
22+
| Binary data | Array(Nullable(Int64)) | Array of byte values |
23+
| JavaScript | String | |
24+
| Null | Null | |
25+
| Object | Dynamic | Each nested field is mapped recursively |
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
---
2+
sidebar_label: 'Ingesting Data from MongoDB to ClickHouse'
3+
description: 'Describes how to seamlessly connect your MongoDB to ClickHouse Cloud.'
4+
slug: /integrations/clickpipes/mongodb
5+
title: 'Ingesting data from MongoDB to ClickHouse (using CDC)'
6+
---
7+
8+
import BetaBadge from '@theme/badges/BetaBadge';
9+
import cp_service from '@site/static/images/integrations/data-ingestion/clickpipes/cp_service.png';
10+
import cp_step0 from '@site/static/images/integrations/data-ingestion/clickpipes/cp_step0.png';
11+
import mongodb_tile from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongodb-tile.png'
12+
import mongodb_connection_details from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/mongodb-connection-details.png'
13+
import ssh_tunnel from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ssh-tunnel.jpg'
14+
import select_destination_db from '@site/static/images/integrations/data-ingestion/clickpipes/mongodb/select-destination-db.png'
15+
import ch_permissions from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/ch-permissions.jpg'
16+
import Image from '@theme/IdealImage';
17+
18+
# Ingesting data from MongoDB to ClickHouse (using CDC)
19+
20+
<BetaBadge/>
21+
22+
:::info
23+
Currently, ingesting data from MongoDB to ClickHouse Cloud via ClickPipes is in Private Preview.
24+
:::
25+
26+
:::note
27+
In the ClickHouse Cloud console and documentation, "table" and "collection" are used interchangeably for MongoDB.
28+
:::
29+
30+
You can use ClickPipes to ingest data from your MongoDB database into ClickHouse Cloud. The source MongoDB database can be hosted on-premises or in the cloud using services like MongoDB Atlas.
31+
32+
## Prerequisites {#prerequisites}
33+
34+
To get started, you first need to ensure that your MongoDB database is correctly configured for replication. The configuration steps depend on how you're deploying MongoDB, so please follow the relevant guide below:
35+
36+
1. [MongoDB Atlas](./mongodb/source/atlas)
37+
38+
2. [Generic MongoDB](./mongodb/source/generic)
39+
40+
Once your source MongoDB database is set up, you can continue creating your ClickPipe.
41+
42+
## Create your ClickPipe {#create-your-clickpipe}
43+
44+
Make sure you are logged in to your ClickHouse Cloud account. If you don't have an account yet, you can sign up [here](https://cloud.clickhouse.com/).
45+
46+
1. In the ClickHouse Cloud console, navigate to your ClickHouse Cloud Service.
47+
48+
<Image img={cp_service} alt="ClickPipes service" size="lg" border/>
49+
50+
2. Select the `Data Sources` button on the left-side menu and click on "Set up a ClickPipe".
51+
52+
<Image img={cp_step0} alt="Select imports" size="lg" border/>
53+
54+
3. Select the `MongoDB CDC` tile.
55+
56+
<Image img={mongodb_tile} alt="Select MongoDB" size="lg" border/>
57+
58+
### Add your source MongoDB database connection {#add-your-source-mongodb-database-connection}
59+
60+
4. Fill in the connection details for your source MongoDB database which you configured in the prerequisites step.
61+
62+
:::info
63+
Before you start adding your connection details make sure that you have whitelisted ClickPipes IP addresses in your firewall rules. On the following page you can find a [list of ClickPipes IP addresses](../index.md#list-of-static-ips).
64+
For more information refer to the source MongoDB setup guides linked at [the top of this page](#prerequisites).
65+
:::
66+
67+
<Image img={mongodb_connection_details} alt="Fill in connection details" size="lg" border/>
68+
69+
#### (Optional) Set up SSH Tunneling {#optional-set-up-ssh-tunneling}
70+
71+
You can specify SSH tunneling details if your source MongoDB database is not publicly accessible.
72+
73+
1. Enable the "Use SSH Tunnelling" toggle.
74+
2. Fill in the SSH connection details.
75+
76+
<Image img={ssh_tunnel} alt="SSH tunneling" size="lg" border/>
77+
78+
3. To use Key-based authentication, click on "Revoke and generate key pair" to generate a new key pair and copy the generated public key to your SSH server under `~/.ssh/authorized_keys`.
79+
4. Click on "Verify Connection" to verify the connection.
80+
81+
:::note
82+
Make sure to whitelist [ClickPipes IP addresses](../clickpipes#list-of-static-ips) in your firewall rules for the SSH bastion host so that ClickPipes can establish the SSH tunnel.
83+
:::
84+
85+
Once the connection details are filled in, click `Next`.
86+
87+
#### Configure advanced settings {#advanced-settings}
88+
89+
You can configure the advanced settings if needed. A brief description of each setting is provided below:
90+
91+
- **Sync interval**: This is the interval at which ClickPipes will poll the source database for changes. This has an implication on the destination ClickHouse service, for cost-sensitive users we recommend to keep this at a higher value (over `3600`).
92+
- **Pull batch size**: The number of rows to fetch in a single batch. This is a best effort setting and may not be respected in all cases.
93+
- **Snapshot number of tables in parallel**: This is the number of tables that will be fetched in parallel during the initial snapshot. This is useful when you have a large number of tables and you want to control the number of tables fetched in parallel.
94+
95+
### Configure the tables {#configure-the-tables}
96+
97+
5. Here you can select the destination database for your ClickPipe. You can either select an existing database or create a new one.
98+
99+
<Image img={select_destination_db} alt="Select destination database" size="lg" border/>
100+
101+
6. You can select the tables you want to replicate from the source MongoDB database. While selecting the tables, you can also choose to rename the tables in the destination ClickHouse database.
102+
103+
### Review permissions and start the ClickPipe {#review-permissions-and-start-the-clickpipe}
104+
105+
7. Select the "Full access" role from the permissions dropdown and click "Complete Setup".
106+
107+
<Image img={ch_permissions} alt="Review permissions" size="lg" border/>
108+
109+
## What's next? {#whats-next}
110+
111+
Once you've set up your ClickPipe to replicate data from MongoDB to ClickHouse Cloud, you can focus on how to query and model your data for optimal performance.
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: 'Pausing and Resuming a MongoDB ClickPipe'
3+
description: 'Pausing and Resuming a MongoDB ClickPipe'
4+
sidebar_label: 'Pause Table'
5+
slug: /integrations/clickpipes/mongodb/pause_and_resume
6+
---
7+
8+
import Image from '@theme/IdealImage';
9+
import pause_button from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/pause_button.png'
10+
import pause_dialog from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/pause_dialog.png'
11+
import pause_status from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/pause_status.png'
12+
import resume_button from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/resume_button.png'
13+
import resume_dialog from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/resume_dialog.png'
14+
15+
There are scenarios where it would be useful to pause a MongoDB ClickPipe. For example, you may want to run some analytics on existing data in a static state. Or, you might be performing upgrades on MongoDB. Here is how you can pause and resume a MongoDB ClickPipe.
16+
17+
## Steps to pause a MongoDB ClickPipe {#pause-clickpipe-steps}
18+
19+
1. In the Data Sources tab, click on the MongoDB ClickPipe you wish to pause.
20+
2. Head over to the **Settings** tab.
21+
3. Click on the **Pause** button.
22+
23+
<Image img={pause_button} border size="md"/>
24+
25+
4. A dialog box should appear for confirmation. Click on Pause again.
26+
27+
<Image img={pause_dialog} border size="md"/>
28+
29+
4. Head over to the **Metrics** tab.
30+
5. Wait for the status of the pipe to be **Paused**.
31+
32+
<Image img={pause_status} border size="md"/>
33+
34+
## Steps to resume a MongoDB ClickPipe {#resume-clickpipe-steps}
35+
1. In the Data Sources tab, click on the MongoDB ClickPipe you wish to resume. The status of the mirror should be **Paused** initially.
36+
2. Head over to the **Settings** tab.
37+
3. Click on the **Resume** button.
38+
39+
<Image img={resume_button} border size="md"/>
40+
41+
4. A dialog box should appear for confirmation. Click on Resume again.
42+
43+
<Image img={resume_dialog} border size="md"/>
44+
45+
5. Head over to the **Metrics** tab.
46+
6. Wait for the status of the pipe to be **Running**.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
title: 'Removing specific tables from a ClickPipe'
3+
description: 'Removing specific tables from a ClickPipe'
4+
sidebar_label: 'Remove Table'
5+
slug: /integrations/clickpipes/mongodb/removing_tables
6+
---
7+
8+
import Image from '@theme/IdealImage';
9+
import remove_table from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/remove_table.png'
10+
11+
In some cases, it makes sense to exclude specific tables from a MongoDB ClickPipe - for example, if a table isn't needed for your analytics workload, skipping it can reduce storage and replication costs in ClickHouse.
12+
13+
## Steps to remove specific tables {#remove-tables-steps}
14+
15+
The first step is to remove the table from the pipe. This can be done by the following steps:
16+
17+
1. [Pause](./pause_and_resume.md) the pipe.
18+
2. Click on Edit Table Settings.
19+
3. Locate your table - this can be done by searching it in the search bar.
20+
4. Deselect the table by clicking on the selected checkbox.
21+
<br/>
22+
23+
<Image img={remove_table} border size="md"/>
24+
25+
5. Click update.
26+
6. Upon successful update, in the **Metrics** tab the status will be **Running**. This table will no longer be replicated by this ClickPipe.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
title: 'Resyncing a Database ClickPipe'
3+
description: 'Doc for resyncing a database ClickPipe'
4+
slug: /integrations/clickpipes/mongodb/resync
5+
sidebar_label: 'Resync ClickPipe'
6+
---
7+
8+
import resync_button from '@site/static/images/integrations/data-ingestion/clickpipes/postgres/resync_button.png'
9+
import Image from '@theme/IdealImage';
10+
11+
### What does Resync do? {#what-mongodb-resync-do}
12+
13+
Resync involves the following operations in order:
14+
15+
1. The existing ClickPipe is dropped, and a new "resync" ClickPipe is kicked off. Thus, changes to source table structures will be picked up when you resync.
16+
2. The resync ClickPipe creates (or replaces) a new set of destination tables which have the same names as the original tables except with a `_resync` suffix.
17+
3. Initial load is performed on the `_resync` tables.
18+
4. The `_resync` tables are then swapped with the original tables. Soft deleted rows are transferred from the original tables to the `_resync` tables before the swap.
19+
20+
All the settings of the original ClickPipe are retained in the resync ClickPipe. The statistics of the original ClickPipe are cleared in the UI.
21+
22+
### Use cases for resyncing a ClickPipe {#use-cases-mongodb-resync}
23+
24+
Here are a few scenarios:
25+
26+
1. You may need to perform major schema changes on the source tables which would break the existing ClickPipe and you would need to restart. You can just click Resync after performing the changes.
27+
2. Specifically for Clickhouse, maybe you needed to change the ORDER BY keys on the target tables. You can Resync to re-populate data into the new table with the right sorting key.
28+
29+
:::note
30+
You can resync multiple times, however please account for the load on the source database when you resync,
31+
since initial load with parallel threads is involved each time.
32+
:::
33+
34+
### Resync ClickPipe Guide {#guide-mongodb-resync}
35+
36+
1. In the Data Sources tab, click on the MongoDB ClickPipe you wish to resync.
37+
2. Head over to the **Settings** tab.
38+
3. Click on the **Resync** button.
39+
40+
<Image img={resync_button} border size="md"/>
41+
42+
4. A dialog box should appear for confirmation. Click on Resync again.
43+
5. Head over to the **Metrics** tab.
44+
6. Wait for the status of the pipe to be **Setup** or **Snapshot**.
45+
7. The initial load of the resync can be monitored in the **Tables** tab - in the **Initial Load Stats** section.

0 commit comments

Comments
 (0)