Skip to content

enhance: add per-type loading overhead upper bound for CachingLayer#71

Open
sparknack wants to merge 2 commits intozilliztech:masterfrom
sparknack:loading-opt
Open

enhance: add per-type loading overhead upper bound for CachingLayer#71
sparknack wants to merge 2 commits intozilliztech:masterfrom
sparknack:loading-opt

Conversation

@sparknack
Copy link
Contributor

Split cell resource estimation into loaded_resource (unconditional) and loading_overhead (capped by per-CellDataType upper bound) to prevent over-reservation when many concurrent loads happen. The real concurrent temporary resource usage is bounded by loading_pool_size * cell_size, so capping at that level avoids blocking other loads unnecessarily.

Copy link
Contributor Author

@sparknack sparknack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

sparknack

This comment was marked as resolved.

@sparknack sparknack force-pushed the loading-opt branch 4 times, most recently from e57a20f to 9b52531 Compare March 13, 2026 06:57
thread_pool.cc uses openblas_set_num_threads under #ifdef OPENBLAS_OS_LINUX.
Only search for openblas on non-Apple platforms; warn if not found instead
of failing the build, since the ifdef guard ensures safe compilation without it.

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
Add LoadingOverheadTracker to cap the total DList loading resource
reservation for temporary loading overhead (e.g. preprocessing buffers)
per CellDataType. The tracker uses a delta-based model:
- Reserve() returns the incremental amount to reserve from DList
- Release() returns the incremental amount to release back to DList
- Total overhead reserved = min(sum_of_overhead, upper_bound)

This prevents over-reservation when many cells load concurrently,
since actual peak resource usage is bounded by pool_size * cell_size.

Changes:
- New LoadingOverheadTracker class with per-type state tracking
- Translator::estimated_byte_size_of_cell now returns {loaded, loading}
  pair to separate final cache usage from temporary loading overhead
- CacheSlot::RunLoad integrates with tracker for reserve/release
- Manager passes tracker to CacheSlots and supports UB registration
- ~300 lines of unit tests for the tracker

Signed-off-by: Shawn Wang <shawn.wang@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant