feat: Support Variable Batch Sizes Across Pipeline Stages for Better Throughput

### Describe the feature
Currently, Mosec requires consistent batch sizes across pipeline stages, which can be limiting for certain NLP tasks like reranking where input sizes vary significantly between requests.

For example, in a reranking service:

One request might have 10 texts to rerank

Another might have 100 texts

The optimal batch size differs per stage:

Preprocessing (flattening, tokenization) can handle large batches

Model inference needs smaller batches

Postprocessing needs to regroup by original request

### Why do you need this feature?

# 1. Real-World NLP Tasks Often Have Variable-Length Inputs
In tasks like reranking, retrieval-augmented generation (RAG), or batch inference:

-  Each query may match a different number of candidate texts (e.g., 10 vs. 100).

- Forcing fixed batch sizes either:
    - Wastes compute (padding small batches → inefficient GPU use)
    - Slows down latency (processing tiny batches sequentially → underutilization)

# 2. Different Pipeline Stages Have Different Optimal Batch Sizes
- Tokenization (CPU-bound) → Benefits from large batches (e.g., 128+ items)

- Model Inference (GPU-bound) → Needs smaller batches (e.g., 8–32) to avoid OOM

- Postprocessing → Needs to regroup by original request

### Additional context

# Proposed Solution:
Enable each pipeline stage to:

Process different batch sizes independently

Maintain request context (e.g., task IDs) to properly regroup results

Allow stages to dynamically split/merge batches based on their optimal sizes

Request A (n=10 texts) ──┐
Request B (n=100 texts)──┼──> Flatten Stage (batch=128) ──> Tokenize (batch=128) ──┐
                         │                                                         │
                         └─────────────────── Task ID Tracking ────────────────────┘

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support Variable Batch Sizes Across Pipeline Stages for Better Throughput #644

Describe the feature

Why do you need this feature?

1. Real-World NLP Tasks Often Have Variable-Length Inputs

2. Different Pipeline Stages Have Different Optimal Batch Sizes

Additional context

Proposed Solution:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Support Variable Batch Sizes Across Pipeline Stages for Better Throughput #644

Description

Describe the feature

Why do you need this feature?

1. Real-World NLP Tasks Often Have Variable-Length Inputs

2. Different Pipeline Stages Have Different Optimal Batch Sizes

Additional context

Proposed Solution:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions