-
Notifications
You must be signed in to change notification settings - Fork 913
Add dynamodb streams input #3908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
791cfee to
e4cae4b
Compare
|
Out of curiosity is there a reason why this can't be named amazon_dynamodb_cdc instead of amazon_dyanmodb_streams? |
We can rename it, but it's referring to how aws calls this feature, so It might be more consistent to call it like that. In the end I don't mind renaming it if you think it's better, just let me know. |
|
My personal opinion is that we are leveraging dynamo db streams as documented but |
| *Type*: `string` | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add an example of using this connector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see in the other inputs, the closest from an example we have are the common and advanced config at the top of the files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to include an example with working IAM auth? Postgres has an example pipeline like follows and I think any examples are always helpful: https://docs.redpanda.com/redpanda-connect/components/inputs/postgres_cdc/#example-pipeline
c507c72 to
4aa10a6
Compare
Yeah, I agree. We have a convention of having
|
4aa10a6 to
af77b90
Compare
|
As per our call it'd be good to:
|
| } | ||
|
|
||
| // Test shard reader structure | ||
| func TestShardReaderStructure(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought (non-blocking): Given this test is verifying we can add data to a map in a struct then read from it, it feels redundant (and no doubt covered by the integration tests) and can be removed. WDYT?
|
|
||
| port := resource.GetPort("8000/tcp") | ||
|
|
||
| t.Run("ReadInsertEvents", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to verify the number of records we receive as part of the test (ie, if there's 100 records in the table then we see 100 objects in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
internal/plugins/info.csv
Outdated
| aws_cloudwatch ,metric ,aws_cloudwatch ,3.36.0 ,community ,n ,n ,n | ||
| aws_dynamodb ,cache ,AWS DynamoDB ,3.36.0 ,community ,n ,y ,y | ||
| aws_dynamodb ,output ,AWS DynamoDB ,3.36.0 ,community ,n ,y ,y | ||
| aws_dynamodb_cdc ,input ,aws_dynamodb_cdc ,0.0.0 ,enterprise ,n ,y ,n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the version
| Reads change data capture (CDC) events from DynamoDB Streams | ||
| Introduced in version 1.0.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to set version
go.mod
Outdated
| github.com/tigerbeetle/tigerbeetle-go v0.16.61 | ||
| github.com/timeplus-io/proton-go-driver/v2 v2.1.2 | ||
| github.com/tmc/langchaingo v0.1.13 | ||
| github.com/tmc/langchaingo v0.1.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we update langchain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had issues with 0.1.13 missing a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the problem is real please send a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad it was on my side, I reverted it to 0.1.13
| table string | ||
| checkpointTable string | ||
| batchSize int32 | ||
| pollInterval time.Duration | ||
| startFrom string | ||
| checkpointLimit int | ||
| maxTrackedShards int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I like to embed config objects for clear separation
Code reviewFound 5 issues:
The connect/internal/impl/aws/input_dynamodb_cdc_batcher.go Lines 105 to 118 in 94512e4
The ack closure captures connect/internal/impl/aws/input_dynamodb_cdc.go Lines 409 to 418 in 94512e4
The implementation lacks metrics for monitoring component health and performance. The comparable Kinesis input has metrics like connect/internal/impl/aws/input_dynamodb_cdc.go Lines 103 to 125 in 94512e4
When connect/internal/impl/aws/input_dynamodb_cdc.go Lines 323 to 332 in 94512e4
Reference pattern: connect/internal/impl/aws/input_kinesis.go Lines 433 to 453 in 94512e4
The documentation lacks a practical usage example and metadata documentation. CONTRIBUTING.md requires "examples, troubleshooting guidance, and known pitfalls" for certified connectors.
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
| recordBatcher *dynamoDBCDCRecordBatcher | ||
|
|
||
| shardReaders map[string]*dynamoDBShardReader | ||
| mu sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning what it protects, the convention is to put mu on top of var group or use XMu for protection of X
| checkpointTable, err := conf.FieldString("checkpoint_table") | ||
| if err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we try to put that into if like if X.checkpointTable, err = conf.FieldString("checkpoint_table"); err != nil
| awsConf, err := GetSession(context.Background(), conf) | ||
| if err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move that to Connect and remove the awsConf field
| var iteratorType types.ShardIteratorType | ||
| var sequenceNumber *string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone should tell Claude to group vars
| d.mu.Lock() | ||
| defer d.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just lock d.shardReaders access not i/o with Amazon
| d.mu.Lock() | ||
| reader.exhausted = true | ||
| d.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really need better / more targeted sync primitives unless we very explicitly explain that mu protects map and its contents. Then why iterator is not protected?
| } | ||
|
|
||
| // Initialize checkpointer | ||
| d.checkpointer, err = newDynamoDBCDCCheckpointer(ctx, d.dynamoClient, d.checkpointTable, *d.streamArn, d.checkpointLimit, d.log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use ctx passed to Connect, search for shutdown.Signaller
| "github.com/redpanda-data/benthos/v4/public/service" | ||
| ) | ||
|
|
||
| // dynamoDBCDCRecordBatcher tracks messages and their checkpoints for DynamoDB CDC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd appreciate if we could add more docs here about what it does and why.
| type dynamoDBCDCRecordBatcher struct { | ||
| mu sync.Mutex | ||
| messageTracker map[*service.Message]*messageCheckpoint | ||
| log *service.Logger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log in the middle, use some grouping or move it down.
| // License (the "License"); you may not use this file except in compliance with | ||
| // the License. You may obtain a copy of the License at | ||
| // | ||
| // https://github.com/redpanda-data/connect/blob/main/licenses/rcl.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be treated as pkg doc - no separation
| // Multiple goroutines should be able to read concurrently | ||
| done := make(chan bool, 3) | ||
|
|
||
| for i := 0; i < 3; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's generally better to use range
|
@squiidz I think we should work on locking granularity, usually we use a channel internally for sending batches, it also gives you data preloading. |
94512e4 to
f47e5a5
Compare
bbecf86 to
3d12324
Compare

No description provided.