Skip to content

[BUG/FEATURE] No configurable setting to increase REMOTE_CAPABLE shard limit on warm nodes (Searchable Snapshots)Β #20839

@dfradehubs

Description

@dfradehubs

Describe the bug

Description

When using Searchable Snapshots on warm nodes in OpenSearch 3.x, the cluster enforces
a hard limit of 1000 REMOTE_CAPABLE shards per warm node, regardless of the value
configured in cluster.max_shards_per_node.

This results in the following error when attempting to mount additional snapshots:

Validation Failed: 1: this action would add [5] total REMOTE_CAPABLE shards, 
but this cluster currently has [8000]/[8000] maximum REMOTE_CAPABLE shards open

Environment

  • OpenSearch version: 3.x
  • Cluster setup: 8 dedicated warm nodes with node.roles=warm
  • cluster.max_shards_per_node: 7000 (applies to hot/data nodes only)
  • REMOTE_CAPABLE limit: 8000 (8 nodes Γ— 1000 default β€” not configurable)

Root Cause

cluster.max_shards_per_node only applies to non-frozen/non-warm data nodes.
The REMOTE_CAPABLE shard pool used by warm nodes for Searchable Snapshots has
a separate hardcoded default of 1000 per node with no public setting to override it.

The Elasticsearch equivalent (cluster.max_shards_per_node.frozen) does not exist
in OpenSearch and returns a settings_exception if attempted:

{
  "error": {
    "type": "settings_exception",
    "reason": "persistent setting [cluster.max_shards_per_node.frozen], not recognized"
  }
}

Expected Behavior

A dedicated, configurable cluster setting should exist to control the maximum number
of REMOTE_CAPABLE shards per warm node β€” for example:

PUT _cluster/settings
{
  "persistent": {
    "cluster.max_shards_per_node.warm": 3000
  }
}

Workarounds (insufficient for production use)

  • Add more warm nodes (each adds 1000 to the limit)
  • Close unused searchable snapshot indices
  • Reduce shard count per index when mounting snapshots

Impact

Clusters with a large number of searchable snapshot indices on warm nodes hit this
limit without any way to increase it via configuration, forcing costly infrastructure
scaling as the only option.

Related component

Storage:Snapshots

To Reproduce

To Reproduce

  1. Set up an OpenSearch 3.x cluster with dedicated warm nodes (node.roles=warm)
  2. Configure cluster.max_shards_per_node: 7000 in cluster settings
  3. Mount searchable snapshots on warm nodes until reaching 1000 shards Γ— number of warm nodes:
   POST /_snapshot/my-repository/my-snapshot/_restore
   {
     "storage_type": "remote_snapshot",
     "indices": "my-index"
   }
  1. Attempt to mount one additional searchable snapshot
  2. Observe the following error:
   Validation Failed: 1: this action would add [5] total REMOTE_CAPABLE shards,
   but this cluster currently has [8000]/[8000] maximum REMOTE_CAPABLE shards open
  1. Attempt to raise the limit using cluster.max_shards_per_node.frozen (Elasticsearch equivalent):
   PUT _cluster/settings
   {
     "persistent": {
       "cluster.max_shards_per_node.frozen": 3000
     }
   }
  1. Observe that the setting is not recognized:
   {
     "error": {
       "type": "settings_exception",
       "reason": "persistent setting [cluster.max_shards_per_node.frozen], not recognized"
     },
     "status": 400
   }

Expected behavior

Expected Behavior

A dedicated cluster setting should exist to control the maximum number of REMOTE_CAPABLE
shards per warm node, similar to how cluster.max_shards_per_node works for hot/data nodes
and how cluster.max_shards_per_node.frozen works in Elasticsearch for frozen nodes.

For example:

PUT _cluster/settings
{
  "persistent": {
    "cluster.max_shards_per_node.warm": 3000
  }
}

This would result in:

  • 8 warm nodes Γ— 3000 = 24,000 maximum REMOTE_CAPABLE shards
  • Operators can tune the limit based on their hardware capacity (heap, disk cache)
  • No forced infrastructure scaling just to increase a hardcoded limit

Current Behavior

The REMOTE_CAPABLE shard limit for warm nodes is hardcoded at 1000 per node and
cannot be changed via any cluster setting. The only workarounds are:

  • Adding more warm nodes (each adds 1000 to the limit)
  • Closing unused searchable snapshot indices
  • Reducing shard count when mounting snapshots

None of these are acceptable as a long-term solution for production clusters with
large amounts of searchable snapshot data.

Additional Details

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    πŸ†• New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions