- 
                Notifications
    
You must be signed in to change notification settings  - Fork 25.6k
 
Introduce INDEX_SHARD_COUNT_FORMAT #137210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 12 commits
53076ab
              ddd46e5
              9818db7
              133f3f2
              90d8b90
              448a9ea
              a584d0f
              d52a8f9
              fa08eac
              d699adf
              a951977
              399b7f1
              8850b8b
              e213e01
              d21f336
              5a8b53a
              a9c22ad
              52dac2e
              b1ab237
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 137210 | ||
| summary: "[WIP] Introduce INDEX_SHARD_COUNT_FORMAT" | ||
| area: Snapshot/Restore | ||
| type: bug | ||
| issues: [] | 
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
| 
          
            
          
           | 
    @@ -398,6 +398,16 @@ public static String getRepositoryDataBlobName(long repositoryGeneration) { | |
| Function.identity() | ||
| ); | ||
| 
     | 
||
| public static final ChecksumBlobStoreFormat<IndexShardCount> INDEX_SHARD_COUNT_FORMAT = new ChecksumBlobStoreFormat<>( | ||
| "index-metadata", | ||
| METADATA_NAME_FORMAT, | ||
| (repoName, parser) -> IndexShardCount.fromIndexMetaData(parser), | ||
| (ignored) -> { | ||
| assert false; | ||
| throw new UnsupportedOperationException(); | ||
| } | ||
| ); | ||
| 
     | 
||
| private static final String SNAPSHOT_CODEC = "snapshot"; | ||
| 
     | 
||
| public static final ChecksumBlobStoreFormat<SnapshotInfo> SNAPSHOT_FORMAT = new ChecksumBlobStoreFormat<>( | ||
| 
          
            
          
           | 
    @@ -1327,13 +1337,13 @@ private void determineShardCount(ActionListener<Void> listener) { | |
| private void getOneShardCount(String indexMetaGeneration) { | ||
| try { | ||
| updateShardCount( | ||
| INDEX_METADATA_FORMAT.read(getProjectRepo(), indexContainer, indexMetaGeneration, namedXContentRegistry) | ||
| .getNumberOfShards() | ||
| INDEX_SHARD_COUNT_FORMAT.read(getProjectRepo(), indexContainer, indexMetaGeneration, namedXContentRegistry).count() | ||
| ); | ||
| } catch (Exception ex) { | ||
| logger.warn(() -> format("[%s] [%s] failed to read metadata for index", indexMetaGeneration, indexId.getName()), ex); | ||
| logger.warn(() -> format("[%s] [%s] failed to read shard count for index", indexMetaGeneration, indexId.getName()), ex); | ||
| // Definitely indicates something fairly badly wrong with the repo, but not immediately fatal here: we might get the | ||
| // shard count from another metadata blob, or we might just not process these shards. If we skip these shards then the | ||
| // shard count from a subsequent indexMetaGeneration, or we might just not process these shards. If we skip these shards | ||
                
       | 
||
| // then the | ||
| // repository will technically enter an invalid state (these shards' index-XXX blobs will refer to snapshots that no | ||
| // longer exist) and may contain dangling blobs too. A subsequent delete that hits this index may repair the state if | ||
| // the metadata read error is transient, but if not then the stale indices cleanup will eventually remove this index | ||
| 
          
            
          
           | 
    ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,71 @@ | ||||||
| /* | ||||||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||||||
| * or more contributor license agreements. Licensed under the "Elastic License | ||||||
| * 2.0", the "GNU Affero General Public License v3.0 only", and the "Server Side | ||||||
| * Public License v 1"; you may not use this file except in compliance with, at | ||||||
| * your election, the "Elastic License 2.0", the "GNU Affero General Public | ||||||
| * License v3.0 only", or the "Server Side Public License, v 1". | ||||||
| */ | ||||||
| 
     | 
||||||
| package org.elasticsearch.repositories.blobstore; | ||||||
| 
     | 
||||||
| import org.elasticsearch.cluster.metadata.IndexMetadata; | ||||||
| import org.elasticsearch.common.settings.Settings; | ||||||
| import org.elasticsearch.common.xcontent.XContentParserUtils; | ||||||
| import org.elasticsearch.xcontent.XContentParser; | ||||||
| 
     | 
||||||
| import java.io.IOException; | ||||||
| 
     | 
||||||
| import static org.elasticsearch.cluster.metadata.IndexMetadata.KEY_SETTINGS; | ||||||
| import static org.elasticsearch.cluster.metadata.IndexMetadata.SETTING_NUMBER_OF_SHARDS; | ||||||
| 
     | 
||||||
| /** | ||||||
| * A subset of {@link IndexMetadata} storing only the shard count of an index | ||||||
| * Prior to v9.3, the entire {@link IndexMetadata} object was stored in heap and then loaded during snapshotting to determine | ||||||
| * the shard count. As per ES-12539, this is replaced with the {@link IndexShardCount} class that writes and loads only the index's | ||||||
| * shard count to and from heap memory, reducing the possibility of smaller nodes going OOMe during snapshotting | ||||||
| */ | ||||||
| public record IndexShardCount(int count) { | ||||||
| /** | ||||||
| * Parses an {@link IndexMetadata} object, reading only the shard count and skipping the rest | ||||||
| * @param parser The parser of the {@link IndexMetadata} object | ||||||
| * @return Returns an {@link IndexShardCount} containing the shard count for the index | ||||||
| * @throws IOException Thrown if the {@link IndexMetadata} object cannot be parsed correctly | ||||||
| */ | ||||||
| public static IndexShardCount fromIndexMetaData(XContentParser parser) throws IOException { | ||||||
                
       | 
||||||
| public static IndexShardCount fromIndexMetaData(XContentParser parser) throws IOException { | |
| public static IndexShardCount fromIndexMetadata(XContentParser parser) throws IOException { | 
        
          
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this leniency is needed in IndexMetadata because it gets parsed from several different sources (at least, it probably used to, maybe not any more). Here we know we're reading from the blob in the snapshot repository so I think we can tighten this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran BlobStoreRepositoryDeleteThrottlingTests here on debug mode and confirmed that both of these code paths are executed during the runtime of that test, and removing them fails the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but do we ever not run these branches?
        
          
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a new dummy object here? We could use null to avoid the allocation. Or maybe we can return from this whole method early as soon as we've read the shard count? No need to consume this parser all the way to the end IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we do need to consume the parser to the end in order to satisfy this constraint, here:
result = reader.apply(projectRepo, parser);
XContentParserUtils.ensureExpectedToken(null, parser.nextToken(), parser);
My original solution returned early as an optimisation and threw an error for this reason.
However, I can instantiate the variable to null if you believe that to be better. By setting it to -1 I was protecting us from returning a null in the case of a malformed IndexMetadata object. If this now does occur, this code here:
try {
    updateShardCount(
        INDEX_SHARD_COUNT_FORMAT.read(getProjectRepo(), indexContainer, indexMetaGeneration, namedXContentRegistry).count()
    );
} catch (Exception ex) {
    logger.warn(() -> format("[%s] [%s] failed to read shard count for index", indexMetaGeneration, indexId.getName()), ex);
    // Definitely indicates something fairly badly wrong with the repo, but not immediately fatal here: we might get the
    // shard count from another metadata blob, or we might just not process these shards. If we skip these shards
    // then the repository will technically enter an invalid state (these shards' index-XXX blobs will refer to snapshots
    // that no longer exist) and may contain dangling blobs too. A subsequent delete that hits this index may repair
    // the state if the metadata read error is transient, but if not then the stale indices cleanup will eventually
    // remove this index and all its extra data anyway.
    // TODO: Should we fail the delete here? See https://github.com/elastic/elasticsearch/issues/100569.
}
will throw a NPE since null.count() isn't possible. The error will be caught and ignored, and we will move on to the next blob. I am happy with this behaviour, just wanted it documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see ok thanks for checking. I'd rather we convert a null into a non-null placeholder value at the end of the method (since it should basically never occur absent corruption)
        
          
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we push the skipping behaviour down even further here? There's no need to load a whole Settings. It's not as big as the mappings but can still be sizeable (e.g. index.query.default_field could list thousands of field names).
        
          
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be performant to call parser.skipChildren() and avoid the while loops.
Uh oh!
There was an error while loading. Please reload this page.