Drop AWS SDK v1: replace KCL with direct v2 stream poller#296
Draft
dkropachev wants to merge 1 commit intomasterfrom
Draft
Drop AWS SDK v1: replace KCL with direct v2 stream poller#296dkropachev wants to merge 1 commit intomasterfrom
dkropachev wants to merge 1 commit intomasterfrom
Conversation
a3adafb to
af6836b
Compare
Remove the spark-kinesis-dynamodb module (5 files, ~800 lines) and its dynamodb-streams-kinesis-adapter dependency, which was the sole reason for depending on AWS SDK v1. Replace with DynamoStreamPoller (~130 lines) that uses the v2 DynamoDbStreamsClient directly to poll shards. This eliminates KCL's unnecessary checkpointing, lease management, and CloudWatch metrics. - Delete spark-kinesis-dynamodb/ module entirely - Delete AttributeValueUtils.scala (v1→v2 conversion no longer needed) - Rewrite DynamoStreamReplication to use v2 AttributeValue natively - Update DynamoStreamReplicationIntegrationTest to use v2 types - Remove unused import in DynamoDBS3Export.scala - Clean up build.sbt (remove module, adapter dep, version val) The only remaining v1 dependency is hadoop-aws (transitive, no code imports).
df913c8 to
4cc0ec5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the KCL-based DynamoDB Streams replication with a direct AWS SDK v2 poller, eliminating the last reason for depending on AWS SDK v1.
What changed
spark-kinesis-dynamodbmodule (5 files, ~800 lines) and itsdynamodb-streams-kinesis-adapterdependency — the sole reason for depending on AWS SDK v1AttributeValueUtils.scala(v1-to-v2 conversion no longer needed)DynamoStreamPoller(~130 lines) using the v2DynamoDbStreamsClientdirectly to poll shardsScheduledExecutorServicethat polls shards on the driver, since stream items are processed directly without Spark RDDs (v2AttributeValueis notSerializable)spark-streamingdependency entirelyStream replication hardening
putItem/deleteItemwithBatchWriteItemRequest(up to 25 items/call) with retry for unprocessed itemsUpdateItemexpressions, enabling multi-runner setups with automatic failover on lease expirymigrator_{table}) for cross-run resume and multi-runner coordinationStreamHandle, close them instop()afterawaitTermination(30s)LimitExceededException,ProvisionedThroughputExceededException,InternalServerError)initialDelay=0)New configuration options (all optional, in DynamoDB source settings)
streamingPollIntervalSeconds(default: 5)streamingMaxConsecutiveErrors(default: 50)streamingPollingPoolSize(default: max(4, availableProcessors))streamingLeaseDurationMs(default: 60000)Test coverage (40 tests)
Unit tests (16, no infrastructure needed):
recordToItem: INSERT, MODIFY, REMOVE, unknown eventretryRandom: success, retry+succeed, exhaust, non-retryable, non-DDBpollShard: success, LimitExceeded retry, ProvisionedThroughput retry, exhaust retries, non-retryable, non-DDB, shard closedIntegration tests (24, need DynamoDB Local + Alternator):
tryClaimShard: unclaimed, expired lease with checkpoint, active lease rejection, re-claim own shardrenewLeaseAndCheckpoint: with checkpoint, without, stolen lease, expiryrun()with renamesMapTest infrastructure:
StreamPollerOpstrait extracted fromDynamoStreamPollerfor testabilityTestStreamPoller: manual test double with configurable function varsNotes
hadoop-aws(transitive only — no migrator code imports v1)build.sh→make buildanddocker-build-jar.sh→make docker-build-jarTest plan
sbt migrator/compilepassessbt tests/compilepassessbt migrator/assemblybuilds successfullycom.amazonawsimports remain in Scala sources🤖 Generated with Claude Code