Skip to content

feat: Support Variable Batch Sizes Across Pipeline Stages for Better Throughput #644

@franklucky001

Description

@franklucky001

Describe the feature

Currently, Mosec requires consistent batch sizes across pipeline stages, which can be limiting for certain NLP tasks like reranking where input sizes vary significantly between requests.

For example, in a reranking service:

One request might have 10 texts to rerank

Another might have 100 texts

The optimal batch size differs per stage:

Preprocessing (flattening, tokenization) can handle large batches

Model inference needs smaller batches

Postprocessing needs to regroup by original request

Why do you need this feature?

1. Real-World NLP Tasks Often Have Variable-Length Inputs

In tasks like reranking, retrieval-augmented generation (RAG), or batch inference:

  • Each query may match a different number of candidate texts (e.g., 10 vs. 100).

  • Forcing fixed batch sizes either:

    • Wastes compute (padding small batches → inefficient GPU use)
    • Slows down latency (processing tiny batches sequentially → underutilization)

2. Different Pipeline Stages Have Different Optimal Batch Sizes

  • Tokenization (CPU-bound) → Benefits from large batches (e.g., 128+ items)

  • Model Inference (GPU-bound) → Needs smaller batches (e.g., 8–32) to avoid OOM

  • Postprocessing → Needs to regroup by original request

Additional context

Proposed Solution:

Enable each pipeline stage to:

Process different batch sizes independently

Maintain request context (e.g., task IDs) to properly regroup results

Allow stages to dynamically split/merge batches based on their optimal sizes

Request A (n=10 texts) ──┐
Request B (n=100 texts)──┼──> Flatten Stage (batch=128) ──> Tokenize (batch=128) ──┐
│ │
└─────────────────── Task ID Tracking ────────────────────┘

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions