-
Notifications
You must be signed in to change notification settings - Fork 25.6k
(Doc+) Split API | add where need to have sufficient disk and how much #119784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 howdy, ES Dev! Will you kindly confirm for [Split API](https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-split-index.html) doc 1. Current doc states ... . My belief is that we're telling customers they need sufficient disk on each node hosting primary shards of the index which are going to be split out. AFAICT the doc does not currently otherwise help users judge which node will "handle the split process" which could randomly be master-only in which case this callout would feel wrong. > The node handling the split process must have sufficient free disk space to accommodate a second copy of the existing index. 2. If Elastic Cloud [hard-link not supported](https://support.elastic.dev/knowledge/view/d2cd7697) (internal link) restriction is ongoing? 🙏 TIA!
Documentation preview: |
@stefnestor please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation. |
Pinging @elastic/es-docs (Team:Docs) |
Pinging @elastic/es-data-management (Team:Data Management) |
Hi! This is a drive-by comment that the page affected by this PR will be removed from the |
index, but with a larger number of primary shards. | ||
index, but with a larger number of primary shards. Created shards allocate | ||
to the same nodes as their correlating source primary shard, so it must | ||
already have sufficient disk to host the copy of the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After consulting @tlrx (thanks!) indeed if the shards are on different nodes, the new index shards will be allocated and split on the nodes of their corresponding shards.
We should probably also clarify the above in the following sentence later in the document:
The node handling the split process must have sufficient free disk space to accommodate a second copy of the existing index.
to something like
The nodes handling the split process must have sufficient free disk space to accommodate a second copy of the original shards.
the file system doesn't support hard-linking, then all segments are copied | ||
into the new index, which is a much more time consuming process.) | ||
+ | ||
TIP: Elastic Cloud's backing file systems do not support hard linking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably best answered by @elastic/es-core-infra team who did #61145 . Adding the author of that PR to this (but anyone from the team feel free to step in to review).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding #61145, we did merge a change for it but ended up removing it because, it turned out, Cloud was in fact using a filesystem that correctly reported quotas (XFS IIRC).
Regarding this tip in the docs, I can only imagine it has something to do with attempting to hard-link across Docker volumes, which doesn't work, but you'd need to check with someone from the Hosted ESS team to get an up-to-date answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefnestor might you know who to include from Hosted ESS team to pitch in / review?
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Important Elastic documentation is migrating to Markdown for version 9.0+. See the migration guide for details. ℹ️ What's happening?
What do I need to do?For <=8.x docs:
For 9.0+ docs:Option 1:
Option 2:
💡 Need help?
|
Closing as new doc system. |
👋 howdy, ES Dev!
Will you kindly confirm for Split API doc
Current doc states ... . My belief is that we're telling customers they need sufficient disk on each node hosting primary shards of the index which are going to be split out. AFAICT the doc does not currently otherwise help users judge which node will "handle the split process" which could randomly be master-only in which case this callout would feel invalid.
If Elastic Cloud hard-link not supported (internal link) restriction is ongoing?
TLDR: Can we add expectations about where split api's created primary shards will allocate so users know where to ensure sufficient disk?
🙏 TIA!