-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Introduce repository integrity verification API #112348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
elasticsearchmachine
merged 25 commits into
elastic:main
from
DaveCTurner:2024/08/29/verify-repo-integrity
Sep 11, 2024
Merged
Changes from 21 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
58a75c2
Introduce repository integrity verification API
DaveCTurner 13ca43a
Update docs/changelog/112348.yaml
DaveCTurner a471a6a
Update docs/changelog/112348.yaml
DaveCTurner 336b24d
Fix comment
DaveCTurner f914e6b
Include docs
DaveCTurner 8620cc3
Merge branch 'main' into 2024/08/29/verify-repo-integrity
DaveCTurner 7915b0c
Duplicated docs
DaveCTurner ec689e4
Include timestamps in log
DaveCTurner 46ab633
TODO done
DaveCTurner 1a97205
Better docs
DaveCTurner 2435c41
Fix sequence diag
DaveCTurner 50fb745
Nullable indexMetadataBlob
DaveCTurner abccf6d
responseBuilder -> responseStream
DaveCTurner 7e25b66
More rename
DaveCTurner bb64a4d
Rename action & comments
DaveCTurner b276f3e
Comment on connection reuse
DaveCTurner 6f905f8
Visibility
DaveCTurner bf131c3
Reorder key
DaveCTurner 621d187
Comments
DaveCTurner 9352efe
comment
DaveCTurner 287a369
Comment about undefined shard gen
DaveCTurner 2c75062
Merge branch 'main' into 2024/08/29/verify-repo-integrity
DaveCTurner bc9a7ad
Fix test
DaveCTurner 1bc2972
Add YAML test
DaveCTurner fe44a14
ML test fix
DaveCTurner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
pr: 112348 | ||
summary: Introduce repository integrity verification API | ||
area: Snapshot/Restore | ||
type: enhancement | ||
issues: | ||
- 52622 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
232 changes: 232 additions & 0 deletions
232
docs/reference/snapshot-restore/apis/verify-repo-integrity-api.asciidoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,232 @@ | ||
[role="xpack"] | ||
[[verify-repo-integrity-api]] | ||
=== Verify repository integrity API | ||
++++ | ||
<titleabbrev>Verify repository integrity</titleabbrev> | ||
++++ | ||
|
||
Verifies the integrity of the contents of a snapshot repository. | ||
|
||
//// | ||
[source,console] | ||
---- | ||
PUT /_snapshot/my_repository | ||
{ | ||
"type": "fs", | ||
"settings": { | ||
"location": "my_backup_location" | ||
} | ||
} | ||
---- | ||
// TESTSETUP | ||
//// | ||
|
||
[source,console] | ||
---- | ||
POST /_snapshot/my_repository/_verify_integrity | ||
---- | ||
|
||
[[verify-repo-integrity-api-request]] | ||
==== {api-request-title} | ||
|
||
`POST /_snapshot/<repository>/_verify_integrity` | ||
|
||
[[verify-repo-integrity-api-prereqs]] | ||
==== {api-prereq-title} | ||
|
||
* If the {es} {security-features} are enabled, you must have the `manage` | ||
<<privileges-list-cluster,cluster privilege>> to use this API. For more | ||
information, see <<security-privileges>>. | ||
|
||
[[verify-repo-integrity-api-desc]] | ||
==== {api-description-title} | ||
|
||
This API allows you to perform a comprehensive check of the contents of a | ||
repository, looking for any anomalies in its data or metadata which might | ||
prevent you from restoring snapshots from the repository or which might cause | ||
future snapshot create or delete operations to fail. | ||
|
||
If you suspect the integrity of the contents of one of your snapshot | ||
repositories, cease all write activity to this repository immediately, set its | ||
`read_only` option to `true`, and use this API to verify its integrity. Until | ||
you do so: | ||
|
||
* It may not be possible to <<snapshots-restore-snapshot,restore some | ||
snapshots>> from this repository. | ||
|
||
* <<searchable-snapshots>> may report errors when searched, or may have | ||
unassigned shards. | ||
|
||
* <<snapshots-take-snapshot,Taking snapshots>> into this repository may fail, | ||
or may appear to succeed having created a snapshot which cannot be restored. | ||
|
||
* <<delete-snapshot-api,Deleting snapshots>> from this repository may fail, or | ||
may appear to succeed leaving the underlying data on disk. | ||
|
||
* Continuing to write to the repository while it is in an invalid state may | ||
causing additional damage to its contents. | ||
|
||
If the <<verify-repo-integrity-api>> API finds any problems with the integrity | ||
of the contents of your repository, {es} will not be able to repair the damage. | ||
The only way to bring the repository back into a fully working state after its | ||
contents have been damaged is by restoring its contents from a | ||
<<snapshots-repository-backup,repository backup>> which was taken before the | ||
damage occurred. You must also identify what caused the damage and take action | ||
to prevent it from happening again. | ||
|
||
If you cannot restore a repository backup, | ||
<<snapshots-register-repository,register a new repository>> and use this for | ||
all future snapshot operations. In some cases it may be possible to recover | ||
some of the contents of a damaged repository, either by | ||
<<snapshots-restore-snapshot,restoring>> as many of its snapshots as needed and | ||
<<snapshots-take-snapshot,taking new snapshots>> of the restored data, or by | ||
using the <<docs-reindex>> API to copy data from any <<searchable-snapshots>> | ||
mounted from the damaged repository. | ||
|
||
Avoid all operations which write to the repository while the | ||
<<verify-repo-integrity-api>> API is running. If something changes the | ||
repository contents while an integrity verification is running then {es} may | ||
incorrectly report having detected some anomalies in its contents due to the | ||
concurrent writes. It may also incorrectly fail to report some anomalies that | ||
the concurrent writes prevented it from detecting. | ||
|
||
NOTE: This API is intended for exploratory use by humans. You should expect the | ||
request parameters and the response format to vary in future versions. | ||
|
||
NOTE: This API may not work correctly in a mixed-version cluster. | ||
|
||
[[verify-repo-integrity-api-path-params]] | ||
==== {api-path-parms-title} | ||
|
||
`<repository>`:: | ||
(Required, string) | ||
Name of the snapshot repository whose integrity to verify. | ||
|
||
[[verify-repo-integrity-api-query-params]] | ||
==== {api-query-parms-title} | ||
|
||
The default values for the parameters of this API are designed to limit the | ||
impact of the integrity verification on other activities in your cluster. For | ||
instance, by default it will only use at most half of the `snapshot_meta` | ||
threads to verify the integrity of each snapshot, allowing other snapshot | ||
operations to use the other half of this thread pool. | ||
|
||
If you modify these parameters to speed up the verification process, you risk | ||
disrupting other snapshot-related operations in your cluster. For large | ||
repositories, consider setting up a separate single-node {es} cluster just for | ||
running the integrity verification API. | ||
|
||
`snapshot_verification_concurrency`:: | ||
(Optional, integer) Specifies the number of snapshots to verify concurrently. | ||
Defaults to `0` which means to use at most half of the `snapshot_meta` thread | ||
pool at once. | ||
|
||
`index_verification_concurrency`:: | ||
(Optional, integer) Specifies the number of indices to verify concurrently. | ||
Defaults to `0` which means to use the entire `snapshot_meta` thread pool. | ||
|
||
`meta_thread_pool_concurrency`:: | ||
(Optional, integer) Specifies the maximum number of snapshot metadata | ||
operations to execute concurrently. Defaults to `0` which means to use at most | ||
half of the `snapshot_meta` thread pool at once. | ||
|
||
`index_snapshot_verification_concurrency`:: | ||
(Optional, integer) Specifies the maximum number of index snapshots to verify | ||
concurrently within each index verification. Defaults to `1`. | ||
|
||
`max_failed_shard_snapshots`:: | ||
(Optional, integer) Limits the number of shard snapshot failures to track | ||
during integrity verification, in order to avoid excessive resource usage. If | ||
your repository contains more than this number of shard snapshot failures then | ||
the verification will fail. Defaults to `10000`. | ||
|
||
`verify_blob_contents`:: | ||
(Optional, boolean) Specifies whether to verify the checksum of every data blob | ||
in the repository. Defaults to `false`. If this feature is enabled, {es} will | ||
read the entire repository contents, which may be extremely slow and expensive. | ||
|
||
`blob_thread_pool_concurrency`:: | ||
(Optional, integer) If `?verify_blob_contents` is `true`, this parameter | ||
specifies how many blobs to verify at once. Defaults to `1`. | ||
|
||
`max_bytes_per_sec`:: | ||
(Optional, <<size-units, size units>>) | ||
If `?verify_blob_contents` is `true`, this parameter specifies the maximum | ||
amount of data that {es} will read from the repository every second. Defaults | ||
to `10mb`. | ||
|
||
[role="child_attributes"] | ||
[[verify-repo-integrity-api-response-body]] | ||
==== {api-response-body-title} | ||
|
||
The response exposes implementation details of the analysis which may change | ||
from version to version. The response body format is therefore not considered | ||
stable and may be different in newer versions. | ||
|
||
`log`:: | ||
(array) A sequence of objects that report the progress of the analysis. | ||
+ | ||
.Properties of `log` | ||
[%collapsible%open] | ||
==== | ||
`timestamp_in_millis`:: | ||
(integer) The timestamp of this log entry, represented as the number of | ||
milliseconds since the {wikipedia}/Unix_time[Unix epoch]. | ||
|
||
`timestamp`:: | ||
(string) The timestamp of this log entry, represented as a string formatted | ||
according to {wikipedia}/ISO_8601[ISO 8601]. Only included if the | ||
<<common-options,`?human`>> flag is set. | ||
|
||
`snapshot`:: | ||
(object) If the log entry pertains to a particular snapshot then the snapshot | ||
will be described in this object. | ||
|
||
`index`:: | ||
(object) If the log entry pertains to a particular index then the index will be | ||
described in this object. | ||
|
||
`snapshot_restorability`:: | ||
(object) If the log entry pertains to the restorability of an index then the | ||
details will be described in this object. | ||
|
||
`anomaly`:: | ||
(string) If the log entry pertains to an anomaly in the repository contents then | ||
this string will describe the anomaly. | ||
|
||
`exception`:: | ||
(object) If the log entry pertains to an exception that {es} encountered during | ||
the verification then the details will be included in this object. | ||
|
||
==== | ||
|
||
`results`:: | ||
(object) An object which describes the final results of the analysis. | ||
+ | ||
.Properties of `results` | ||
[%collapsible%open] | ||
==== | ||
`status`:: | ||
(object) The final status of the analysis task. | ||
|
||
`final_repository_generation`:: | ||
(integer) The repository generation at the end of the analysis. If there were | ||
any writes to the repository during the analysis then this value will be | ||
different from the `generation` reported in the task status, and the analysis | ||
may have detected spurious anomalies due to the concurrent writes, or may even | ||
have failed to detect some anomalies in the repository contents. | ||
|
||
`total_anomalies`:: | ||
(integer) The total number of anomalies detected during the analysis. | ||
|
||
`result`:: | ||
(string) The final result of the analysis. If the repository contents appear to | ||
be intact then this will be the string `pass`. If this field is missing, or | ||
contains some other value, then the repository contents were not fully | ||
verified. | ||
|
||
==== | ||
|
||
`exception`:: | ||
(object) If the analysis encountered an exception which prevented it from | ||
completing successfully then this exception will be reported here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to enforce
ready_only
is set in the API? I am not sure whether it is worthwhile to report some anomalies after a length check just because there were concurrent writes?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to enforce that, no. We might want to run this on e.g. a Cloud repo, and getting the right permissions to modify its metadata is hard to do.