Skip to content

Commit 3d12324

Browse files
committed
Add dynamoDB streams support for cdc
1 parent b18847b commit 3d12324

File tree

10 files changed

+2567
-1
lines changed

10 files changed

+2567
-1
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,5 @@ release_notes.md
1010
.task
1111
.vscode
1212
.op
13+
.gomodcache
1314
__pycache__
Lines changed: 329 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,329 @@
1+
= aws_dynamodb_cdc
2+
:type: input
3+
:status: beta
4+
:categories: ["Services","AWS"]
5+
6+
7+
8+
////
9+
THIS FILE IS AUTOGENERATED!
10+
11+
To make changes, edit the corresponding source file under:
12+
13+
https://github.com/redpanda-data/connect/tree/main/internal/impl/<provider>.
14+
15+
And:
16+
17+
https://github.com/redpanda-data/connect/tree/main/cmd/tools/docs_gen/templates/plugin.adoc.tmpl
18+
////
19+
20+
// © 2024 Redpanda Data Inc.
21+
22+
23+
component_type_dropdown::[]
24+
25+
26+
Reads change data capture (CDC) events from DynamoDB Streams
27+
28+
Introduced in version 1.0.0.
29+
30+
31+
[tabs]
32+
======
33+
Common::
34+
+
35+
--
36+
37+
```yml
38+
# Common config fields, showing default values
39+
input:
40+
label: ""
41+
aws_dynamodb_cdc:
42+
table: "" # No default (required)
43+
checkpoint_table: redpanda_dynamodb_checkpoints
44+
start_from: trim_horizon
45+
```
46+
47+
--
48+
Advanced::
49+
+
50+
--
51+
52+
```yml
53+
# All config fields, showing default values
54+
input:
55+
label: ""
56+
aws_dynamodb_cdc:
57+
table: "" # No default (required)
58+
checkpoint_table: redpanda_dynamodb_checkpoints
59+
batch_size: 1000
60+
poll_interval: 1s
61+
start_from: trim_horizon
62+
checkpoint_limit: 1000
63+
max_tracked_shards: 10000
64+
region: "" # No default (optional)
65+
endpoint: "" # No default (optional)
66+
tcp:
67+
connect_timeout: 0s
68+
keep_alive:
69+
idle: 15s
70+
interval: 15s
71+
count: 9
72+
tcp_user_timeout: 0s
73+
credentials:
74+
profile: "" # No default (optional)
75+
id: "" # No default (optional)
76+
secret: "" # No default (optional)
77+
token: "" # No default (optional)
78+
from_ec2_role: false # No default (optional)
79+
role: "" # No default (optional)
80+
role_external_id: "" # No default (optional)
81+
```
82+
83+
--
84+
======
85+
86+
Consumes records from DynamoDB Streams with automatic checkpointing and shard management.
87+
88+
DynamoDB Streams capture item-level changes in DynamoDB tables. This input supports:
89+
- Automatic shard discovery and management
90+
- Checkpoint-based resumption after crashes
91+
- Multiple shard processing
92+
93+
For better performance and longer retention, consider using Kinesis Data Streams for DynamoDB
94+
with the `aws_kinesis` input instead.
95+
96+
== Metadata
97+
98+
This input adds the following metadata fields to each message:
99+
100+
- `dynamodb_shard_id` - The shard ID from which the record was read
101+
- `dynamodb_sequence_number` - The sequence number of the record in the stream
102+
- `dynamodb_event_name` - The type of change: INSERT, MODIFY, or REMOVE
103+
- `dynamodb_table` - The name of the DynamoDB table
104+
105+
== Metrics
106+
107+
This input exposes the following metrics:
108+
109+
- `dynamodb_cdc_shards_tracked` - Total number of shards being tracked (gauge)
110+
- `dynamodb_cdc_shards_active` - Number of active shards currently being read from (gauge)
111+
112+
113+
== Fields
114+
115+
=== `table`
116+
117+
The name of the DynamoDB table to read streams from.
118+
119+
120+
*Type*: `string`
121+
122+
123+
=== `checkpoint_table`
124+
125+
DynamoDB table name for storing checkpoints. Will be created if it doesn't exist.
126+
127+
128+
*Type*: `string`
129+
130+
*Default*: `"redpanda_dynamodb_checkpoints"`
131+
132+
=== `batch_size`
133+
134+
Maximum number of records to read in a single batch.
135+
136+
137+
*Type*: `int`
138+
139+
*Default*: `1000`
140+
141+
=== `poll_interval`
142+
143+
Time to wait between polling attempts when no records are available.
144+
145+
146+
*Type*: `string`
147+
148+
*Default*: `"1s"`
149+
150+
=== `start_from`
151+
152+
Where to start reading when no checkpoint exists. `trim_horizon` starts from the oldest available record, `latest` starts from new records.
153+
154+
155+
*Type*: `string`
156+
157+
*Default*: `"trim_horizon"`
158+
159+
Options:
160+
`trim_horizon`
161+
, `latest`
162+
.
163+
164+
=== `checkpoint_limit`
165+
166+
Maximum number of messages to process before updating checkpoint.
167+
168+
169+
*Type*: `int`
170+
171+
*Default*: `1000`
172+
173+
=== `max_tracked_shards`
174+
175+
Maximum number of shards to track simultaneously. Prevents memory issues with extremely large tables.
176+
177+
178+
*Type*: `int`
179+
180+
*Default*: `10000`
181+
182+
=== `region`
183+
184+
The AWS region to target.
185+
186+
187+
*Type*: `string`
188+
189+
190+
=== `endpoint`
191+
192+
Allows you to specify a custom endpoint for the AWS API.
193+
194+
195+
*Type*: `string`
196+
197+
198+
=== `tcp`
199+
200+
TCP socket configuration.
201+
202+
203+
*Type*: `object`
204+
205+
206+
=== `tcp.connect_timeout`
207+
208+
Maximum amount of time a dial will wait for a connect to complete. Zero disables.
209+
210+
211+
*Type*: `string`
212+
213+
*Default*: `"0s"`
214+
215+
=== `tcp.keep_alive`
216+
217+
TCP keep-alive probe configuration.
218+
219+
220+
*Type*: `object`
221+
222+
223+
=== `tcp.keep_alive.idle`
224+
225+
Duration the connection must be idle before sending the first keep-alive probe. Zero defaults to 15s. Negative values disable keep-alive probes.
226+
227+
228+
*Type*: `string`
229+
230+
*Default*: `"15s"`
231+
232+
=== `tcp.keep_alive.interval`
233+
234+
Duration between keep-alive probes. Zero defaults to 15s.
235+
236+
237+
*Type*: `string`
238+
239+
*Default*: `"15s"`
240+
241+
=== `tcp.keep_alive.count`
242+
243+
Maximum unanswered keep-alive probes before dropping the connection. Zero defaults to 9.
244+
245+
246+
*Type*: `int`
247+
248+
*Default*: `9`
249+
250+
=== `tcp.tcp_user_timeout`
251+
252+
Maximum time to wait for acknowledgment of transmitted data before killing the connection. Linux-only (kernel 2.6.37+), ignored on other platforms. When enabled, keep_alive.idle must be greater than this value per RFC 5482. Zero disables.
253+
254+
255+
*Type*: `string`
256+
257+
*Default*: `"0s"`
258+
259+
=== `credentials`
260+
261+
Optional manual configuration of AWS credentials to use. More information can be found in xref:guides:cloud/aws.adoc[].
262+
263+
264+
*Type*: `object`
265+
266+
267+
=== `credentials.profile`
268+
269+
A profile from `~/.aws/credentials` to use.
270+
271+
272+
*Type*: `string`
273+
274+
275+
=== `credentials.id`
276+
277+
The ID of credentials to use.
278+
279+
280+
*Type*: `string`
281+
282+
283+
=== `credentials.secret`
284+
285+
The secret for the credentials being used.
286+
[CAUTION]
287+
====
288+
This field contains sensitive information that usually shouldn't be added to a config directly, read our xref:configuration:secrets.adoc[secrets page for more info].
289+
====
290+
291+
292+
293+
*Type*: `string`
294+
295+
296+
=== `credentials.token`
297+
298+
The token for the credentials being used, required when using short term credentials.
299+
300+
301+
*Type*: `string`
302+
303+
304+
=== `credentials.from_ec2_role`
305+
306+
Use the credentials of a host EC2 machine configured to assume https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html[an IAM role associated with the instance^].
307+
308+
309+
*Type*: `bool`
310+
311+
Requires version 4.2.0 or newer
312+
313+
=== `credentials.role`
314+
315+
A role ARN to assume.
316+
317+
318+
*Type*: `string`
319+
320+
321+
=== `credentials.role_external_id`
322+
323+
An external ID to provide when assuming a role.
324+
325+
326+
*Type*: `string`
327+
328+
329+

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,7 @@ require (
338338
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.9 // indirect
339339
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.3 // indirect
340340
github.com/aws/aws-sdk-go-v2/internal/v4a v1.4.9 // indirect
341-
github.com/aws/aws-sdk-go-v2/service/dynamodbstreams v1.31.0 // indirect
341+
github.com/aws/aws-sdk-go-v2/service/dynamodbstreams v1.31.0
342342
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.13.1 // indirect
343343
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.9.0 // indirect
344344
github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.11.9 // indirect

0 commit comments

Comments
 (0)