You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update Airbyte-Couchbase integration tutorial with enhanced tags, improved descriptions, and clarification on sync modes. Adjusted examples for better accuracy and added notes on best practices for data ingestion and synchronization.
Copy file name to clipboardExpand all lines: tutorial/markdown/connectors/airbyte/airbyte-couchbase-integration.md
+30-83Lines changed: 30 additions & 83 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,9 +13,9 @@ technology:
13
13
- query
14
14
tags:
15
15
- Airbyte
16
-
- Data Integration
17
-
- ETL
18
16
- Connector
17
+
- Data Ingestion
18
+
- Best Practices
19
19
sdk_language:
20
20
- python
21
21
length: 35 Mins
@@ -28,7 +28,9 @@ Airbyte is an open-source data integration platform that enables you to move dat
28
28
-**Cross-bucket replication**: Sync data between buckets within the same or different Couchbase clusters
29
29
-**Analytics pipelines**: Extract data from Couchbase to data warehouses or analytics platforms
30
30
-**Data ingestion**: Load data from SaaS applications, databases, or APIs into Couchbase
31
-
-**Change data capture**: Track and replicate document changes in near real-time
31
+
-**Change data capture**: Track and replicate document changes with periodic syncs
32
+
33
+
> **Note**: Airbyte is designed for batch/periodic data synchronization (typically 5-60 minute intervals), not sub-second real-time change tracking. For true real-time CDC, consider Couchbase's built-in XDCR or Eventing services.
32
34
33
35
This tutorial will guide you through setting up Airbyte with Couchbase Capella (cloud-hosted) as both source and destination, covering configuration, sync modes, common patterns, and best practices.
34
36
@@ -94,6 +96,8 @@ This tutorial assumes you have:
94
96
95
97
The Couchbase source connector allows Airbyte to extract data from your Couchbase buckets. It automatically discovers all collections within a bucket and creates individual streams for each.
96
98
99
+
> **What is a stream?** In Airbyte, a stream represents a single data source (in this case, a Couchbase collection) that can be synced to a destination. Each stream has its own schema, sync mode, and cursor configuration. Learn more in [Airbyte's documentation](https://docs.airbyte.com/understanding-airbyte/connections/).
100
+
97
101
### Step 1: Prepare Your Couchbase Source
98
102
99
103
#### Create a Database User
@@ -176,7 +180,7 @@ Example streams from a `travel-sample` bucket:
@@ -194,8 +198,8 @@ Syncs all documents from the collection every time.
194
198
```sql
195
199
SELECT META().id as _id,
196
200
TO_NUMBER(meta().xattrs.$document.last_modified) as _ab_cdc_updated_at,
197
-
*
198
-
FROM`bucket`.`scope`.`collection`
201
+
c AS`bucket`
202
+
FROM`bucket`.`scope`.`collection`AS c
199
203
```
200
204
201
205
**When to use**:
@@ -217,8 +221,8 @@ Syncs only new or modified documents since the last sync.
217
221
```sql
218
222
SELECT META().id as _id,
219
223
TO_NUMBER(meta().xattrs.$document.last_modified) as _ab_cdc_updated_at,
220
-
*
221
-
FROM`bucket`.`scope`.`collection`
224
+
c AS`bucket`
225
+
FROM`bucket`.`scope`.`collection`AS c
222
226
WHERE TO_NUMBER(meta().xattrs.$document.last_modified) > {last_cursor_value}
223
227
ORDER BY TO_NUMBER(meta().xattrs.$document.last_modified) ASC
224
228
```
@@ -257,7 +261,7 @@ The Couchbase destination connector allows Airbyte to load data into your Couchb
257
261
-**Permissions**: Assign "Data Reader", "Data Writer", and "Query Manager" roles
258
262
4. Save the credentials
259
263
260
-
**Note**: Query Manager role is required for automatic collection and index creation.
264
+
**Note**: Query Manager role is required for automatic collection and index creation. These Database Access credentials are used for cluster connections via the SDK, distinct from Capella API credentials which would be used for Capella management operations.
261
265
262
266
#### Ensure Network Access
263
267
@@ -425,12 +429,7 @@ For each enabled stream, select the appropriate sync mode combination:
425
429
| Full Refresh | Overwrite | Complete replacement each sync | Mirror source exactly |
0 commit comments