Skip to content

Exporter batcher multi-tenant support - partitioning #12795

@sfc-gh-sili

Description

@sfc-gh-sili

Component(s)

exporter/exporterhelper

Current status

The current exporter batcher does not support batching based on metadata and maintains only a single active batch at any given time.

Proposal

We propose introducing partitioning as a mechanism for batch organization. This builds upon the terminology of partition and shard in issue #12473, with a focus on partitioning in this proposal.

  1. A partition represents the logical separation of the data based on the input (e.g. a key in the context or an entry in the resource attributes).
  2. A partition may be split into multiple shards and multiple batches may be produced for the same partition.

Getting the Partition Key

In alignment with #10825 we should support fetching partition key from both metadata and the event message itself. Proposal is to wrap getKeyFunc in a type called partitioner and provide it along with queue_batch.settings.

e := NewBaseExporter().WithQueueBatchSettings(
    QueueBatchSettings[request.Request]{
        Encoding: logsEncoding{},
        Sizers: map[request.SizerType]request.Sizer[request.Request]{
            request.SizerTypeRequests: request.RequestsSizer[request.Request]{},
        },
	**Partitioner: newQueueParitioner(getKeyFunc),**
}

-----

type Partitioner[T any] interface {
	GetKey(context.Context, T) string
}

type GetKeyFunc[T any] func(context.Context, T) string

func (f GetKeyFunc[T]) GetKey(ctx context.Context, t T) string {
	return f(ctx, t)
}

Single Queue vs Multi Queue

We have identified two use cases for the multi-tenant batching

  • Case 1 single failure domain: When all partitions share the same downstream service, they have the same failure domain, so a shared queue is sufficient for this case. It is easier to manage memory / storage with a single queue.
  • Case 2 multiple failure domain: When different partitions correspond to separate downstream services, they have separate failure domain. In this case, using per-partition queues ensures better isolation and prevents head-of-line blocking.
Case 1: shared queue Case 2: per-partition queue
Image Image

Users should be able to choose between these two options. Proposing a boolean config parameter per_partition

  • sending_queue
    • enabled (default = true)
    • per_partition(default = false) : whether to allocate a new sending_queue for each partition. If set to false, all partitions will share the same sending_queue. Ignored if partitioner is not configured.
    • num_consumers (default = 10): Number of consumers that dequeue batches; ignored if enabled is false
    • wait_for_result (default = false): determines if incoming requests are blocked until the request is processed or not.
    • block_on_overflow (default = false): If true, blocks the request until the queue has space otherwise rejects the data immediately; ignored if enabled is false
    • sizer (default = requests): How the queue and batching is measured. Available options:
      • requests: number of incoming batches of metrics, logs, traces (the most performant option);
      • items: number of the smallest parts of each signal (spans, metric data points, log records);
      • bytes: the size of serialized data in bytes (the least performant option).
    • queue_size (default = 1000): Maximum size the queue can accept. Measured in units defined by sizer
    • batch disabled by default if not defined
      • flush_timeout: time after which a batch will be sent regardless of its size.
      • min_size: the minimum size of a batch.
      • min_size: the maximum size of a batch, enables batch splitting.

Related

#8122
#10825
#12473

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions