Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/137210.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 137210
summary: "[WIP] Introduce INDEX_SHARD_COUNT_FORMAT"
area: Snapshot/Restore
type: bug
issues: []
Original file line number Diff line number Diff line change
Expand Up @@ -2881,13 +2881,7 @@ public static IndexMetadata legacyFromXContent(XContentParser parser) throws IOE
} else if (token == XContentParser.Token.START_OBJECT) {
if ("settings".equals(currentFieldName)) {
Settings settings = Settings.fromXContent(parser);
if (SETTING_INDEX_VERSION_COMPATIBILITY.get(settings).isLegacyIndexVersion() == false) {
throw new IllegalStateException(
"this method should only be used to parse older incompatible index metadata versions "
+ "but got "
+ SETTING_INDEX_VERSION_COMPATIBILITY.get(settings).toReleaseVersion()
);
}
checkSettingIndexVersionCompatibility(settings);
builder.settings(settings);
} else if ("mappings".equals(currentFieldName)) {
Map<String, Object> mappingSourceBuilder = new HashMap<>();
Expand Down Expand Up @@ -2980,6 +2974,16 @@ private static void handleLegacyMapping(Builder builder, Map<String, Object> map
}
}

private static void checkSettingIndexVersionCompatibility(Settings settings) {
if (SETTING_INDEX_VERSION_COMPATIBILITY.get(settings).isLegacyIndexVersion() == false) {
throw new IllegalStateException(
"this method should only be used to parse older incompatible index metadata versions "
+ "but got "
+ SETTING_INDEX_VERSION_COMPATIBILITY.get(settings).toReleaseVersion()
);
}
}

/**
* Return the {@link IndexVersion} of Elasticsearch that has been used to create an index given its settings.
*
Expand Down Expand Up @@ -3228,4 +3232,109 @@ public static int parseIndexNameCounter(String indexName) {
throw new IllegalArgumentException("unable to parse the index name [" + indexName + "] to extract the counter", e);
}
}

/**
* A subset of {@link IndexMetadata} storing only the shard count of an index
* Prior to v9.3, the entire {@link IndexMetadata} object was stored in heap and then loaded during snapshotting to determine
* the shard count. As per ES-12539, this is replaced with the {@link IndexShardCount} class that writes and loads only the index's
* shard count to and from heap memory, reducing the possibility of smaller nodes going OOMe during snapshotting
*/
public static class IndexShardCount implements ToXContentFragment {
private static final String KEY_SHARD_COUNT = "shard_count";
private final int shardCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer all these changes live in blobstore package instead here. It is kinda confusing for people who does not care about how snapshot works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! Will move them


public IndexShardCount(int count) {
this.shardCount = count;
}

public int getCount() {
return shardCount;
}

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
builder.field(KEY_SHARD_COUNT, shardCount);
return builder;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have a toXContent here, we're only using this to read the shard count, not to write it out again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally defined the INDEX_SHARD_COUNT_FORMAT variable as:

public static final ChecksumBlobStoreFormat<Integer> INDEX_SHARD_COUNT_FORMAT =

However, attempting to create a ChecksumBlobStoreFormat<T> where T does not implement the ToXContent interface results in the below error:

no instance(s) of type variable(s) exist so that T conforms to ToXContent

To fix this, I needed to define a separate class IndexShardCount, and ensure it implemented the ToXContent interface (hence this method's inclusion).

If you know a work around for this then I am happy to change!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this seems to work for me:

    public static final ChecksumBlobStoreFormat<Integer> SHARD_COUNT_FORMAT = new ChecksumBlobStoreFormat<>(
        "shard-count",
        METADATA_NAME_FORMAT,
        (repoName, parser) -> Integer.valueOf(0) /* TODO */,
        (ignored) -> {
            assert false;
            throw new UnsupportedOperationException();
        }
    );

I mean I'd prefer to have a separate class (record?) for IndexShardCount rather than just having a bare Integer but I don't think this class needs to support ToXContent unless I'm missing something.


public static IndexShardCount.Builder builder() {
return new IndexShardCount.Builder();
}

public static class Builder {
private int count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems kinda excessive - we don't need a builder for a class with just one int field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to remove!


public IndexShardCount.Builder setCount(int count) {
this.count = count;
return this;
}

public IndexShardCount build() {
return new IndexShardCount(count);
}
}

/**
* Parses an {@link IndexMetadata} object, reading only the shard count and skipping the rest
* @param parser The parser of the {@link IndexMetadata} object
* @return Returns an {@link IndexShardCount} containing the shard count for the index
* @throws IOException Thrown if the {@link IndexMetadata} object cannot be parsed correctly
*/
public static IndexShardCount fromIndexMetaData(XContentParser parser) throws IOException {
return fromIndexMetaData(parser, false);
}

public static IndexShardCount fromIndexMetaData(XContentParser parser, boolean legacy) throws IOException {
if (parser.currentToken() == null) { // fresh parser? move to the first token
parser.nextToken();
}
if (parser.currentToken() == XContentParser.Token.START_OBJECT) { // on a start object move to next token
parser.nextToken();
}
XContentParserUtils.ensureExpectedToken(XContentParser.Token.FIELD_NAME, parser.currentToken(), parser);
String currentFieldName;
XContentParser.Token token = parser.nextToken();
XContentParserUtils.ensureExpectedToken(XContentParser.Token.START_OBJECT, token, parser);

Builder indexShardCountBuilder = new Builder();
// Skip over everything except the settings object we care about, or any unexpected tokens
while ((currentFieldName = parser.nextFieldName()) != null) {
token = parser.nextToken();
if (token == XContentParser.Token.START_OBJECT) {
if (currentFieldName.equals(KEY_SETTINGS)) {
Settings settings = Settings.fromXContent(parser);
Copy link
Member

@ywangd ywangd Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we still need to read the settings, I wonder whether it is necessary to have this as a separate format. The following "inclusive filtering parser" seems to work for me

try (
    XContentParser parser = createParser(
        XContentParserConfiguration.EMPTY.withDeprecationHandler(LoggingDeprecationHandler.INSTANCE)
            .withFiltering(Set.of("*.settings", "*.mapping_version", "*.settings_version"), null, false),
        JsonXContent.jsonXContent,
        BytesReference.bytes(builder)
    )
) {
    final var indexMetadata = IndexMetadata.fromXContent(parser);
}

It builds an IndexMetadata with the minimal number of required information which is not much besides settings.

Copy link
Contributor Author

@joshua-adams-1 joshua-adams-1 Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising - let me have a go locally and see what I can do! As a note, if it is still loading some IndexMetadata object with N size, since at this time I don't know the value of N, we may still need to implement concurrency if having 10 threads loading this in parallel would cause a small node to OOMe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that trying to construct an almost-empty IndexMetadata instance will break some invariant somewhere (or make it hard to add invariants in future). I think I prefer a separate reader, plus testing to verify that the separate reader can read all possible formats of IndexMetadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a WIP PR #137335 if this matches your expectations. If not, I am also working on a new revision of this PR with the above requested changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is still loading some IndexMetadata object with N size, since at this time I don't know the value of N, we may still need to implement concurrency if having 10 threads loading this in parallel would cause a small node to OOMe.

It should just load what is specified in the config, which is settings and a few longs. Since the new Format reads settings as well, I think the memory footprint should be quite close.

make it hard to add invariants in future

This is true. I was actually a little surprised on how little invariants we have today for the almost empty IndexMetadata.

I have a WIP PR #137335 if this matches your expectations

Thanks for testing it out. I personally quite like its simplicity. But I am ok with a new reader as well. Also the latest trimmed down version looks simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer a new ChecksumBlobStoreFormat rather than passing the filter directly to INDEX_METADATA_FORMAT. However, it would be nice for the new format to use a filtering XContentParserConfiguration somehow because I believe this can skip over all the filtered-out junk without allocating anything. Possibly we can achieve something similar with skipChildren() but the filter seems clearer IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ChecksumBlobStoreFormat deserialization is here. The parser is defined here as:

XContentParser parser = XContentHelper.createParserNotCompressed(
    XContentParserConfiguration.EMPTY.withRegistry(namedXContentRegistry)
        .withDeprecationHandler(LoggingDeprecationHandler.INSTANCE),
    bytesReference,
    XContentType.SMILE
)

There is currently no way to configure this. I extended this method in #137335 to have a XContentParserConfiguration passed in as a parameter, but I think it would be redundant to integrate this change into this PR (as opposed to just having that by itself anyways). If I was defining the parser using a XContentParserConfiguration to skip over all the fields other than shard count, would the reader not just become a wrapper to invoking the parser and returning the result?

if (legacy) {
checkSettingIndexVersionCompatibility(settings);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think it makes sense to reject older index metadata like this - we're only trying to get the shard count, we don't need to be strict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my response to this comment, I can remove this

indexShardCountBuilder.setCount(settings.getAsInt(SETTING_NUMBER_OF_SHARDS, -1));
} else {
// Iterate through the object, but we don't care for it's contents
while ((token = parser.nextToken()) != XContentParser.Token.END_OBJECT) {
}
}
} else if (token == XContentParser.Token.START_ARRAY) {
// Iterate through the array, but we don't care for it's contents
while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) {
}
} else if (token.isValue() == false) {
throw new IllegalArgumentException("Unexpected token " + token);
}
}
XContentParserUtils.ensureExpectedToken(XContentParser.Token.END_OBJECT, parser.nextToken(), parser);
return indexShardCountBuilder.build();
}

/**
* Parses legacy metadata from ES versions that are no longer index-compatible, returning information on best-effort basis.
* <p>
* Like {@link #fromIndexMetaData}, we are parsing an {@link IndexMetadata} object,
* reading only the shard count and skipping the rest.
* <p>
* Throws an exception if the metadata is index-compatible with the current version (in that case,
* {@link #fromXContent} should be used to load the content.
*/
public static IndexShardCount fromLegacyIndexMetaData(XContentParser parser) throws IOException {
return fromIndexMetaData(parser, true);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,14 @@ public static String getRepositoryDataBlobName(long repositoryGeneration) {
Function.identity()
);

public static final ChecksumBlobStoreFormat<IndexMetadata.IndexShardCount> INDEX_SHARD_COUNT_FORMAT = new ChecksumBlobStoreFormat<>(
"index-metadata",
METADATA_NAME_FORMAT,
(repoName, parser) -> IndexMetadata.IndexShardCount.fromLegacyIndexMetaData(parser),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, TIL, I didn't know we had this notion of a "fallback" parser. Apparently we added it in #78244 to add back support for repositories dating back to 5.0. Any idea what's changed? I'm not sure we need this just to get hold of the settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was originally to preserve functionality with the existing INDEX_METADATA_FORMAT. Since I was still technically parsing the IndexMetadata (even if just for one field) I assumed I would also need a legacy and non-legacy parsing method. However, since I am only parsing the shard count and ignoring everything else, there is no difference between versions and this can be removed. Good catch!

(repoName, parser) -> IndexMetadata.IndexShardCount.fromIndexMetaData(parser),
Function.identity()
);

private static final String SNAPSHOT_CODEC = "snapshot";

public static final ChecksumBlobStoreFormat<SnapshotInfo> SNAPSHOT_FORMAT = new ChecksumBlobStoreFormat<>(
Expand Down Expand Up @@ -1327,13 +1335,14 @@ private void determineShardCount(ActionListener<Void> listener) {
private void getOneShardCount(String indexMetaGeneration) {
try {
updateShardCount(
INDEX_METADATA_FORMAT.read(getProjectRepo(), indexContainer, indexMetaGeneration, namedXContentRegistry)
.getNumberOfShards()
INDEX_SHARD_COUNT_FORMAT.read(getProjectRepo(), indexContainer, indexMetaGeneration, namedXContentRegistry)
.getCount()
);
} catch (Exception ex) {
logger.warn(() -> format("[%s] [%s] failed to read metadata for index", indexMetaGeneration, indexId.getName()), ex);
logger.warn(() -> format("[%s] [%s] failed to read shard count for index", indexMetaGeneration, indexId.getName()), ex);
// Definitely indicates something fairly badly wrong with the repo, but not immediately fatal here: we might get the
// shard count from another metadata blob, or we might just not process these shards. If we skip these shards then the
// shard count from a subsequent indexMetaGeneration, or we might just not process these shards. If we skip these shards
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I preferred "another metadata blob" here. "Subsequent" implies some kind of ordering but we're loading all this stuff in parallel, and might instead have got the shard count from an earlier blob. Also indexMetaGeneration is really the name of the blob rather than the blob itself, so metadata blob is more accurate.

// then the
// repository will technically enter an invalid state (these shards' index-XXX blobs will refer to snapshots that no
// longer exist) and may contain dangling blobs too. A subsequent delete that hits this index may repair the state if
// the metadata read error is transient, but if not then the stale indices cleanup will eventually remove this index
Expand Down Expand Up @@ -1904,7 +1913,8 @@ record RootBlobUpdateResult(RepositoryData oldRepositoryData, RepositoryData new
}
}));

// Write the index metadata for each index in the snapshot
// Write the index metadata and the shard count to memory for each index in the snapshot, so that it persists
// even if the repository is deleted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"write to memory" is kinda odd phrasing - writes are supposed to be persistent here. Maybe this is leftover from an earlier change?

for (IndexId index : indices) {
executor.execute(ActionRunnable.run(allMetaListeners.acquire(), () -> {
final IndexMetadata indexMetaData = projectMetadata.index(index.getName());
Expand Down
Loading