Use InMemoryNoOpCommitDirectory for archives indices only #121210

tlrx · 2025-01-29T16:40:06Z

Since #118606 searchable snapshots shards are not expected to write files on disk, with the exception of archives indices mounted as searchable snapshots which require to rewrite the segment infos file in a newer version.

Ideally we should be able to remove the usage of the InMemoryNoOpCommitDirectory for non-archives searchable snapshots indices and only rely on SearchableSnapshotDirectory that throws on write operations. Similarly, starting 9.0 searchable snapshots shards do not write files on disk and therefore should be able to use a Directory implementation that forbids writes. Searchable snapshots shards for indices created before 9.0 require a mutable directory for peer-recoveries.

In this change, we only allow writes for archives indices and searchable snapshots created before 9.0.

Relates ES-10438

elasticsearchmachine · 2025-02-03T09:44:55Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

fcofdez

LGTM

tlrx · 2025-02-04T08:35:18Z

Thanks Francisco!

henningandersen

LateLGTM

(though I have a question).

henningandersen · 2025-02-04T08:42:09Z

...main/java/org/elasticsearch/xpack/searchablesnapshots/store/SearchableSnapshotDirectory.java


    @Override
-    public void sync(Collection<String> names) {
-        throw unsupportedException();


Just so I am sure I understand this correctly, we still call sync and syncMetadata but do not modify anything? I wonder why these are still called?

syncMetadata is called during peer-recoveries by Store#cleanupAndVerify and if I remember correctly sync is called during snapshots.

Those methods are still called because peer-recovery of searchable snapshots is the same as for regular indices, with the difference that the files on the replica "magically" appears before the recovery starts. We could change that but it was a bit too much work when I tried because the code is intermingled and required to add many "if not searchable snapshot do this" conditions (and also because before we would need to accomodate for searchable snapshots indices before #118606 that creates an additional commit during recovery). It's feasible though, but the code ended up being much less readible.

So I think that relying on SearchableSnapshotDirectory throwing on files change operations would help catching any places where we're changing files whereas after #118606 we should not.

Since I'm not fully confident that we're testing all places I only merged this in 9.1.

…1210) Since elastic#118606 searchable snapshots shards are not expected to write files on disk, with the exception of archives indices mounted as searchable snapshots which require to rewrite the segment infos file in a newer version. Ideally we should be able to remove the usage of the InMemoryNoOpCommitDirectory for non-archives searchable snapshots indices and only rely on SearchableSnapshotDirectory that throws on write operations. Similarly, starting 9.0 searchable snapshots shards do not write files on disk and therefore should be able to use a Directory implementation that forbids writes. Searchable snapshots shards for indices created before 9.0 require a mutable directory for peer-recoveries. In this change, we only allow writes for archives indices and searchable snapshots created before 9.0. Relates ES-10438

… stopped This is caught thanks to elastic#121210: if shard files are verified/checksumed while the node is stopping, an IllegalStateException is throw by CacheService.get() when it attempts to read data from the cache. This exception later caused the verification to fail and then the Lucene index to be marked as corrupted (which nows fails for searchable snapshots shards that are read-only and should not be corrupted at all). This pull request changes ensureLifecycleStarted(), which is called during CacheService.get(), to throw an AlreadyClosedException when the service is stopped (note that ACE extends IllegalStateException, which is convenient here). This ACE will be later specially handlded in the checksumIndex method to not mark the shard as corrupted (see elastic#121210). Closes elastic#121927

… stopped (#122006) This is caught thanks to #121210: if shard files are verified/checksumed while the node is stopping, an IllegalStateException is throw by CacheService.get() when it attempts to read data from the cache. This exception later caused the verification to fail and then the Lucene index to be marked as corrupted (which nows fails for searchable snapshots shards that are read-only and should not be corrupted at all). This pull request changes ensureLifecycleStarted(), which is called during CacheService.get(), to throw an AlreadyClosedException when the service is stopped (note that ACE extends IllegalStateException, which is convenient here). This ACE will be later specially handlded in the checksumIndex method to not mark the shard as corrupted (see #121210). Closes #121927

tlrx added >non-issue :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. v9.0.0 labels Jan 29, 2025

elasticsearchmachine added v9.1.0 and removed v9.0.0 labels Jan 30, 2025

debug

e929328

tlrx force-pushed the 2025/01/29/inmemorynoop branch from 0e6166d to e929328 Compare January 31, 2025 09:55

tlrx added 3 commits January 31, 2025 11:55

unwrap

e12de1c

Merge branch 'main' into 2025/01/29/inmemorynoop

3225cb5

Merge branch 'main' into 2025/01/29/inmemorynoop

4b26376

tlrx marked this pull request as ready for review February 3, 2025 09:44

tlrx requested review from fcofdez and henningandersen and removed request for henningandersen February 3, 2025 09:44

elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Feb 3, 2025

fcofdez approved these changes Feb 3, 2025

View reviewed changes

tlrx merged commit 8502e5b into elastic:main Feb 4, 2025
17 checks passed

tlrx deleted the 2025/01/29/inmemorynoop branch February 4, 2025 08:35

henningandersen reviewed Feb 4, 2025

View reviewed changes

tlrx mentioned this pull request Feb 7, 2025

Make CacheService.get() throws AlreadyClosedException when service is stopped #122006

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use InMemoryNoOpCommitDirectory for archives indices only #121210

Use InMemoryNoOpCommitDirectory for archives indices only #121210

Uh oh!

tlrx commented Jan 29, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Feb 3, 2025

Uh oh!

fcofdez left a comment

Uh oh!

Uh oh!

tlrx commented Feb 4, 2025

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Feb 4, 2025

Uh oh!

tlrx Feb 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use InMemoryNoOpCommitDirectory for archives indices only #121210

Use InMemoryNoOpCommitDirectory for archives indices only #121210

Uh oh!

Conversation

tlrx commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Feb 3, 2025

Uh oh!

fcofdez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tlrx commented Feb 4, 2025

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

tlrx Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tlrx commented Jan 29, 2025 •

edited

Loading