-
Notifications
You must be signed in to change notification settings - Fork 725
Open
Description
Observation
Mimir handles WAL replay asynchronously and concurrently during ingester
startup via openExistingTSDB(), which uses configurable WALReplayConcurrency
to parallelize WAL loading across multiple TSDBs.
However, when a new user sends their first push request at runtime, the ingester
calls getOrCreateTSDB(userID) which then calls createTSDB(userID, 0) with
walReplayConcurrency = 0. This appears to replay the WAL sequentially and
blocks the entire push request.
Code Reference
- Startup path (concurrent):
https://github.com/grafana/mimir/blob/main/pkg/ingester/ingester.go#L2931 - Runtime path (blocking):
https://github.com/grafana/mimir/blob/main/pkg/ingester/ingester.go#L2665
Questions
- Is the synchronous WAL replay on new user TSDB creation intentional?
- Are new users expected to have minimal/no WAL, so blocking is acceptable?
- Would applying the same
WALReplayConcurrencyto runtime-created TSDBs
improve performance for tenants with accumulated WALs?
Potential Impact
- A user with months of WAL history sending their first push would experience
a blocking request while the entire WAL replays sequentially - Unlike startup, there's no "gate" preventing user requests during WAL load
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels