-
Notifications
You must be signed in to change notification settings - Fork 138
Description
It'd be helpful to have greater control over how records are batched. Once scenario my team has discussed is using the Shard ID as a means to group data for batching, irrespective of partition.
The general thinking is that we can get the relevant Token from a PreparedStatement by computing the partition key and passing it to the Partitioner. To achieve this, ideally get_partitioner_name() would be pub instead of pub(crate) on PreparedStatement.
With the Token, get the Shard ID from somewhere that makes sense - no good ideas on this yet, perhaps a method on the ClusterData.
App specific logic can then be used to group records by the shard ID and batch them for writing to Scylla.
- Is there any reason why this is a bad idea?
- What's the appetite for API changes that would enable this?
Rather than expose the shard information, a shard aware batching API would be more useful. I think probably related is #448.