Skip to content

Conversation

bcully
Copy link
Contributor

@bcully bcully commented Nov 13, 2024

We have gotten more than one SDH due to customers not understanding why restarts involving fully-mounted indices can pull a lot of data from the snapshot tier, so it may help to be more explicit about why this happens and how it can be avoided.

We have gotten more than one SDH due to customers not understanding
why restarts involving fully-mounted indices can pull a lot of data
from the snapshot tier, so it may help to be more explicit about
why this happens and how it can be avoided.
@bcully bcully added >docs General docs changes :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs auto-backport Automatically create backport pull requests when merged v9.0.0 v8.17.0 Team:Distributed Coordination Meta label for Distributed Coordination team labels Nov 13, 2024
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Docs Meta label for docs team label Nov 13, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (with two minor edits).

Comment on lines 296 to 297
It's worth noting that if a searchable snapshot index has no replicas (as is the default
in the cold tier), then when the node hosting it is shut down, allocation will immediately
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to avoid the default mention in this section, we speak of replicas just above this section.

Suggested change
It's worth noting that if a searchable snapshot index has no replicas (as is the default
in the cold tier), then when the node hosting it is shut down, allocation will immediately
It's worth noting that if a searchable snapshot index has no replicas, then when the node hosting it is shut down, allocation will immediately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

in the cold tier), then when the node hosting it is shut down, allocation will immediately
try to relocate the index to a new node in order to maximize availability. For fully mounted
indices this will result in the new node downloading the entire index snapshot from
the cloud repository, which might be expensive especially during rolling restarts. Temporarily
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we generally expect those costs to be very low and only exceptionally for it to have real cost implications, so I'd like to remove the cost mention:

Suggested change
the cloud repository, which might be expensive especially during rolling restarts. Temporarily
the cloud repository. Temporarily

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@bcully bcully enabled auto-merge (squash) November 14, 2024 16:59
Copy link
Contributor

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a comment suggestion.

The other place I was thinking could use additional documentation would be the rolling restart docs -- or maybe in the allocation settings under allocation.enable, primaries. We could mention that searchable snapshots without replicas will specially not be reallocated when primaries is set. This doesn't need to hold up this PR, though, just an idea for an additional improvement if you feel like it.

multiple clusters and use <<modules-cross-cluster-search,{ccs}>> or
<<xpack-ccr,{ccr}>> instead of {search-snaps}.

It's worth noting that if a searchable snapshot index has no replicas, then when the node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd delete "It's worth noting that" and start the sentence as is without it -- you're writing it, so it's obviously worth noting :)

Should this be a WARNING block as well, like directly above? Seems a little strange to have text -> warning -> text again; and this is sort of a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged just before your comment, apologies!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, looks like a near perfect race condition 😄

@bcully bcully merged commit b77df85 into elastic:main Nov 14, 2024
4 of 5 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

bcully added a commit to bcully/elasticsearch that referenced this pull request Nov 14, 2024
We have gotten more than one SDH due to customers not understanding
why restarts involving fully-mounted indices can pull a lot of data
from the snapshot tier, so it may help to be more explicit about
why this happens and how it can be avoided.
elasticsearchmachine pushed a commit that referenced this pull request Nov 14, 2024
We have gotten more than one SDH due to customers not understanding
why restarts involving fully-mounted indices can pull a lot of data
from the snapshot tier, so it may help to be more explicit about
why this happens and how it can be avoided.
@bcully bcully deleted the doc-snapshot-restart-cost branch November 14, 2024 23:10
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request Nov 18, 2024
We have gotten more than one SDH due to customers not understanding
why restarts involving fully-mounted indices can pull a lot of data
from the snapshot tier, so it may help to be more explicit about
why this happens and how it can be avoided.
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
We have gotten more than one SDH due to customers not understanding
why restarts involving fully-mounted indices can pull a lot of data
from the snapshot tier, so it may help to be more explicit about
why this happens and how it can be avoided.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >docs General docs changes Team:Distributed Coordination Meta label for Distributed Coordination team Team:Docs Meta label for docs team v8.17.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants