Skip to content

Shard-aware batchingΒ #468

@junglie85

Description

@junglie85

It'd be helpful to have greater control over how records are batched. Once scenario my team has discussed is using the Shard ID as a means to group data for batching, irrespective of partition.

The general thinking is that we can get the relevant Token from a PreparedStatement by computing the partition key and passing it to the Partitioner. To achieve this, ideally get_partitioner_name() would be pub instead of pub(crate) on PreparedStatement.

With the Token, get the Shard ID from somewhere that makes sense - no good ideas on this yet, perhaps a method on the ClusterData.

App specific logic can then be used to group records by the shard ID and batch them for writing to Scylla.

  1. Is there any reason why this is a bad idea?
  2. What's the appetite for API changes that would enable this?

Rather than expose the shard information, a shard aware batching API would be more useful. I think probably related is #448.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions