diff --git a/docs/reference/snapshot-restore/apis/repo-analysis-api.asciidoc b/docs/reference/snapshot-restore/apis/repo-analysis-api.asciidoc index ca46ba1fb2b57..e2c6f0ba70f5e 100644 --- a/docs/reference/snapshot-restore/apis/repo-analysis-api.asciidoc +++ b/docs/reference/snapshot-restore/apis/repo-analysis-api.asciidoc @@ -60,23 +60,41 @@ measure the performance characteristics of your storage system. The default values for the parameters to this API are deliberately low to reduce the impact of running an analysis inadvertently and to provide a sensible starting point for your investigations. Run your first analysis with the default -parameter values to check for simple problems. If successful, run a sequence of -increasingly large analyses until you encounter a failure or you reach a -`blob_count` of at least `2000`, a `max_blob_size` of at least `2gb`, a -`max_total_data_size` of at least `1tb`, and a `register_operation_count` of at -least `100`. Always specify a generous timeout, possibly `1h` or longer, to -allow time for each analysis to run to completion. Perform the analyses using a -multi-node cluster of a similar size to your production cluster so that it can -detect any problems that only arise when the repository is accessed by many -nodes at once. +parameter values to check for simple problems. Some repositories may behave +correctly when lightly loaded but incorrectly under production-like workloads. +If the first analysis is successful, run a sequence of increasingly large +analyses until you encounter a failure or you reach a `blob_count` of at least +`2000`, a `max_blob_size` of at least `2gb`, a `max_total_data_size` of at least +`1tb`, and a `register_operation_count` of at least `100`. Always specify a +generous timeout, possibly `1h` or longer, to allow time for each analysis to +run to completion. Some repositories may behave correctly when accessed by a +small number of {es} nodes but incorrectly when accessed concurrently by a +production-scale cluster. Perform the analyses using a multi-node cluster of a +similar size to your production cluster so that it can detect any problems that +only arise when the repository is accessed by many nodes at once. If the analysis fails then {es} detected that your repository behaved -unexpectedly. This usually means you are using a third-party storage system -with an incorrect or incompatible implementation of the API it claims to -support. If so, this storage system is not suitable for use as a snapshot -repository. You will need to work with the supplier of your storage system to -address the incompatibilities that {es} detects. See -<> for more information. +unexpectedly. This usually means you are using a third-party storage system with +an incorrect or incompatible implementation of the API it claims to support. If +so, this storage system is not suitable for use as a snapshot repository. +Repository analysis triggers conditions that occur only rarely when taking +snapshots in a production system. Snapshotting to unsuitable storage may appear +to work correctly most of the time despite repository analysis failures. However +your snapshot data is at risk if you store it in a snapshot repository that does +not reliably pass repository analysis. You can demonstrate that the analysis +failure is due to an incompatible storage implementation by verifying that +Elasticsearch does not detect the same problem when analysing the reference +implementation of the storage protocol you are using. For instance, if you are +using storage that offers an API which the supplier claims to be compatible with +AWS S3, verify that repositories in AWS S3 do not fail repository analysis. This +allows you to demonstrate to your storage supplier that a repository analysis +failure must only be caused by an incompatibility with AWS S3 and cannot be +attributed to a problem in Elasticsearch. Please do not report Elasticsearch +issues involving third-party storage systems unless you can demonstrate that the +same issue exists when analysing a repository that uses the reference +implementation of the same storage protocol. You will need to work with the +supplier of your storage system to address the incompatibilities that {es} +detects. See <> for more information. If the analysis is successful this API returns details of the testing process, optionally including how long each operation took. You can use this information