-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Use read/write engine lock to guard operations against resets #124635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
782187e to
acf2a3a
Compare
fbea5c1 to
4958122
Compare
82ada07 to
f565ee1
Compare
920c83c to
2256a6f
Compare
2256a6f to
a2f57d2
Compare
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I assume tests pass?
| return store.getMetadata(null, true); | ||
| engineLock.readLock().lock(); | ||
| try { | ||
| synchronized (closeMutex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should take the closeMutex first to have same lock acquisition ordering as in close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I pushed a1b5ff3
server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Outdated
Show resolved
Hide resolved
| // How do we ensure that no indexing operations have been processed since prepareForEngineReset() here? We're not | ||
| // blocking all operations when resetting the engine nor we are blocking flushes or force-merges. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the assumption is that we are indeed blocking operations? IIUC, we only hollow under the permit today and unhollow will happen prior to the indexing. I suppose we will get to this issue down the road once we start online hollowing, but would it not be ok to assume that the client has acquired permits/blocked operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the relocation also does something to merge and flush.
But you are probably right about the need to protect against changes. Could the IndexEngine do so, given that it knows it is now hollow through a method called from all such mutating methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are indeed blocking ingestion when hollowing during the primary relocation.
We are also in general blocking ingestion (with our own ingestion blocker in stateless) when unhollowing.
I think maybe we could add an assertion that either the permits are held, or the ingestion blocker in stateless is installed here. But it might be a bit cumbersome to put the assertion on the ingestion blocker here (since it's in stateless code). I'd leave it to @fcofdez to figure this out (and can help if needed).
There is a small chance a force merge might come through and it might fail, and we created ES-11277 to investigate in the future if it's serious to handle (for the moment we believe not that serious).
|
Unrelated failure: Triggering CI again. |
…csearch into ES-10826-no-refresh-on-close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Relates elastic#124635 Closes ES-11324
…c#124635) Today shard's engine mutation are guarded by an engineMutex object monitor. But we would like to be able to execute one or more operations on an engine instance, without this instance being resetted during the execution of the operation. In order to do that, this change replaces the engineMutex by a reentrant read/write lock and introduces two new methods IndexShard#withEngine() and IndexShard#withEngineOrNull() that can be used to execute an operation while avoiding the current engine instance to be reset. It does not prevent it to be closed during execution though. Relates ES-10826 Co-authored-by: Francisco Fernández Castaño <[email protected]>
…t resets (elastic#124635)" (elastic#125915)" This reverts commit 7fadeeb.
…t resets (elastic#124635)" (elastic#125915)" This reverts commit 7fadeeb.
…t resets (elastic#124635)" (elastic#125915)" This reverts commit 7fadeeb.
…t resets (elastic#124635)" (elastic#125915)" This reverts commit 7fadeeb.
…6311) This change re-introduces the engine read/write lock to guard against engine resets. It differs from #124635 on the following: uses the engineMutex for creating/closing engines uses the reentrant r/w lock for retaining engine instances and for resetting the engine acquires the reentrant read lock during refreshes to prevent deadlocks during resets add tests to ensure no deadlock when re-acquiring read lock in refresh listeners Relates ES-11447
Today shard's engine mutation are guarded by an
engineMutexobject monitor. But we would like to be able to execute one or more operations on an engine instance, without this instance being resetted during the execution of the operation.In order to do that, this change replaces the
engineMutexby a reentrant read/write lock and introduces two new methodsIndexShard#withEngine()andIndexShard#withEngineOrNull()that can be used to execute an operation while avoiding the current engine instance to be reset. It does not prevent it to be closed during execution though.Relates ES-10826
Note: I'm opening this change for further discussion and hand-off.