-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
Component(s)
exporter/exporterhelper
Current status
The current exporter batcher does not support batching based on metadata and maintains only a single active batch at any given time.
Proposal
We propose introducing partitioning as a mechanism for batch organization. This builds upon the terminology of partition and shard in issue #12473, with a focus on partitioning in this proposal.
- A partition represents the logical separation of the data based on the input (e.g. a key in the context or an entry in the resource attributes).
- A partition may be split into multiple shards and multiple batches may be produced for the same partition.
Getting the Partition Key
In alignment with #10825 we should support fetching partition key from both metadata and the event message itself. Proposal is to wrap getKeyFunc in a type called partitioner and provide it along with queue_batch.settings.
e := NewBaseExporter().WithQueueBatchSettings(
QueueBatchSettings[request.Request]{
Encoding: logsEncoding{},
Sizers: map[request.SizerType]request.Sizer[request.Request]{
request.SizerTypeRequests: request.RequestsSizer[request.Request]{},
},
**Partitioner: newQueueParitioner(getKeyFunc),**
}
-----
type Partitioner[T any] interface {
GetKey(context.Context, T) string
}
type GetKeyFunc[T any] func(context.Context, T) string
func (f GetKeyFunc[T]) GetKey(ctx context.Context, t T) string {
return f(ctx, t)
}
Single Queue vs Multi Queue
We have identified two use cases for the multi-tenant batching
- Case 1 single failure domain: When all partitions share the same downstream service, they have the same failure domain, so a shared queue is sufficient for this case. It is easier to manage memory / storage with a single queue.
- Case 2 multiple failure domain: When different partitions correspond to separate downstream services, they have separate failure domain. In this case, using per-partition queues ensures better isolation and prevents head-of-line blocking.
| Case 1: shared queue | Case 2: per-partition queue |
|---|---|
![]() |
![]() |
Users should be able to choose between these two options. Proposing a boolean config parameter per_partition
sending_queueenabled(default = true)per_partition(default = false) : whether to allocate a newsending_queuefor each partition. If set tofalse, all partitions will share the samesending_queue. Ignored if partitioner is not configured.num_consumers(default = 10): Number of consumers that dequeue batches; ignored ifenabledisfalsewait_for_result(default = false): determines if incoming requests are blocked until the request is processed or not.block_on_overflow(default = false): If true, blocks the request until the queue has space otherwise rejects the data immediately; ignored ifenabledisfalsesizer(default = requests): How the queue and batching is measured. Available options:requests: number of incoming batches of metrics, logs, traces (the most performant option);items: number of the smallest parts of each signal (spans, metric data points, log records);bytes: the size of serialized data in bytes (the least performant option).
queue_size(default = 1000): Maximum size the queue can accept. Measured in units defined bysizerbatchdisabled by default if not definedflush_timeout: time after which a batch will be sent regardless of its size.min_size: the minimum size of a batch.min_size: the maximum size of a batch, enables batch splitting.
Related
axw, Erog38, at-ishikawa and araiu
Metadata
Metadata
Assignees
Labels
No labels

