-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe
Summary
This RFC proposes adding Virtual Shards to OpenSearch — a lightweight logical partitioning layer between documents and physical shards. Instead of routing directly to a fixed physical shard, documents are first assigned to one of many virtual shards, which are then mapped to physical shards at runtime.
This enables live rebalancing, elastic scaling, and temporal locality handling, without reindexing or changes to clients.
Motivation and Value
Virtual shards improve OpenSearch’s flexibility and long-term scalability by decoupling routing from physical shard allocation. This allows more granular routing and makes it easier to isolate hotspots, scale incrementally, and optimize storage tiering without operational disruption.
They offer a clean way to:
| Capability | How Virtual Shards Help |
|---|---|
| Elastic Growth | Add physical shards and remap vShards without reindexing |
| Hotspot Isolation | Move only hot vShards (e.g. recent time range or tenant) |
| Temporal Locality | Co-locate recent vShards on fast nodes and archive older ones |
| Operational Simplicity | Start with many vShards, scale physical shards only when needed |
| Autoscaling Ready | Enable future rebalancing based on traffic/load |
| Index Flexibility | Avoid over/undersharding, future-proof long-lived indices |
This design complements existing shard functionality and prepares OpenSearch for large-scale, dynamic, multi-tenant workloads.
Describe the solution you'd like
Design Overview
Core Idea
document → virtual_shard → physical_shard
- Each index declares
number_of_virtual_shards(e.g. 1024) - Routing uses
hash(routing_key) % num_vshardsto pick a vShard - A vShard-to-physical-shard map is stored in cluster state
- Documents are routed to the mapped physical shard internally
Index Configuration Example
{
"settings": {
"number_of_shards": 4,
"number_of_virtual_shards": 1024
}
}- The index stores data in 4 physical shards
- Routing logic uses 1024 logical vShards
- Internal mapping allows reassignment without index rebuild
What This Enables
| Use Case | Benefit |
|---|---|
| Live Scale-Out | Add a physical shard, remap only impacted vShards, no reindexing |
| Tenant Isolation | Route specific tenants to their own vShards and scale independently |
| Temporal Querying | Route by timestamp, isolate hot vShards, optimize performance |
| Cold Tiering | Move older vShards to cold storage without splitting the index |
| Long-Running Index | Evolve layout over time without changing index name or structure |
Moving a vShard
To relocate a vShard:
- Identify and copy documents that hash to that vShard
- Replay translog updates for the vShard
- Update the vShard-to-shard mapping in cluster state (atomic)
- Remove old segments via merge or compaction
The process uses existing replication and fencing mechanisms to ensure safety.
Admin APIs
| Endpoint | Description |
|---|---|
GET /_vshard/state |
View current vShard → shard mapping |
POST /_vshard/move |
Manually trigger a vShard relocation |
GET /_vshard/stats |
View metrics like document count, size, QPS |
These APIs are internal/admin only — not exposed to client applications.
Compatibility
| Feature | Status |
|---|---|
| ILM | Compatible; vShards can follow tier policies |
| Replication | Works with existing shard replication |
| Routing Key | Fully supported (impacts vShard assignment) |
| Search APIs | No client-side changes |
| Indexing | Adds routing layer before shard resolution |
Use Case: Temporal Data (e.g. Box-like Ingestion)
A system indexing user folders and files over time benefits from this model:
- Routing key includes folder ID and timestamp (e.g.
folder:2024-07) - Recent folders map to hot vShards
- Hot vShards are routed to fast SSD-backed physical shards
- Older vShards are moved to cold nodes
- No rollovers or reindexing required
Implementation Plan
Phase 1 – Core Support
- Add support for
number_of_virtual_shardsas an index setting - Compute vShard ID per document using routing hash
- Maintain vShard → physical shard mapping in cluster state
- Update indexing and search paths to route through vShards
Phase 2 – Manual Movement
- Add admin API to move vShards
- Support atomic map update + translog replay per vShard
- Enable parallel movement for disjoint vShards
Phase 3 – Optional Rebalancer
- Track per-vShard stats: size, document count, query rate
- Add load-aware background balancer (pluggable)
- Future: autoscaler hooks for vShard-based elasticity
Fault Tolerance
- vShard mapping is stored and versioned in cluster metadata
- Routing is atomic and consistent across replicas
- Movement reuses fencing, recovery, and replication infrastructure
- System prevents routing inconsistency during transitions
Defaults and Compatibility
| Item | Behavior |
|---|---|
| Default value | number_of_virtual_shards = null |
| Existing indices | No change |
| Backward compatibility | Full |
| Client-facing API changes | None |
Observability
- Optional: Add vShard ID to slow logs and request tracing
- Expose
/vshard/statsfor introspection and tuning - Enable future integrations with load balancers and allocators
Clarifying Relationship to Dynamic Shard Splitting and Shrinking
This proposal for Virtual Shards is not competing with dynamic shard splitting; it operates at a different layer and addresses a different set of challenges. The two proposals are complementary and can co-exist to enhance OpenSearch’s flexibility.
| Feature | Dynamic Shard Split/Shrink | Virtual Shards (This RFC) |
|---|---|---|
| Unit of Change | Physical Shard | Logical Virtual Shard (vShard) |
| Purpose | Change physical shard count post-creation | Introduce logical indirection layer for routing flexibility |
| Operation Cost | Involves data copying / merges / reindexing | Lightweight remapping + selective relocation |
| Target Use Cases | Undersharded or oversharded static indices | Hotspot isolation, temporal skew, multi-tenant scaling |
| Reindexing Required? | Yes (implicitly or explicitly) | No |
| Long-Running Index Support | Limited (splits are disruptive) | Yes (vShards remap live) |
| Routing Granularity | Physical shard | Logical vShard (e.g., 1024 vShards → 4 shards) |
Summary
- Shard split/shrink is ideal for permanent, structural changes to physical shard layout post-index-creation.
- Virtual Shards provide fine-grained routing flexibility and live rebalancing for dynamic or long-lived workloads — without requiring index restarts or client changes.
Integration Potential
Virtual Shards can complement dynamic shard splitting:
- You could split an overloaded shard and then rebalance only its affected virtual shards across new shards.
- Or use virtual shards to delay or avoid physical splits by remapping hotspots to different nodes.
We’re happy to align these mechanisms so they serve orthogonal purposes:
- Shard splitting: structural, coarse-grained change
- Virtual shards: logical, fine-grained elastic routing
Both together enable OpenSearch to scale cleanly across a broader set of real-world indexing patterns.
Final Notes
Virtual shards bring elastic scaling, dynamic isolation, and long-term adaptability to OpenSearch. They are designed to solve real-world challenges like tenant hotspots, temporal skew, and uneven document growth — while keeping the system stable, simple, and compatible.
They integrate seamlessly with existing indexing and search infrastructure, and open up new capabilities without introducing client complexity.
This RFC proposes making virtual sharding a core abstraction within OpenSearch’s routing and indexing path.
Related component
ShardManagement:Placement
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status