Skip to content

Question: Why Does getOrCreateTSDB() Block WAL Replay for New Users? #14267

@Amr-Shams

Description

@Amr-Shams

Observation

Mimir handles WAL replay asynchronously and concurrently during ingester
startup via openExistingTSDB(), which uses configurable WALReplayConcurrency
to parallelize WAL loading across multiple TSDBs.

However, when a new user sends their first push request at runtime, the ingester
calls getOrCreateTSDB(userID) which then calls createTSDB(userID, 0) with
walReplayConcurrency = 0. This appears to replay the WAL sequentially and
blocks the entire push request.

Code Reference

Questions

  1. Is the synchronous WAL replay on new user TSDB creation intentional?
  2. Are new users expected to have minimal/no WAL, so blocking is acceptable?
  3. Would applying the same WALReplayConcurrency to runtime-created TSDBs
    improve performance for tenants with accumulated WALs?

Potential Impact

  • A user with months of WAL history sending their first push would experience
    a blocking request while the entire WAL replays sequentially
  • Unlike startup, there's no "gate" preventing user requests during WAL load

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions