Skip to content

Commit 8aa461b

Browse files
authored
Add docs on troubleshooting NFS repos (#97601) (#97812)
Spell out a bit more clearly that ES works through the OS's filesystem abstraction, giving advice about how to reproduce problems outside of ES.
1 parent c4d7657 commit 8aa461b

File tree

2 files changed

+48
-11
lines changed

2 files changed

+48
-11
lines changed

docs/reference/snapshot-restore/apis/verify-repo-api.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<titleabbrev>Verify snapshot repository</titleabbrev>
55
++++
66

7-
Verifies that a snapshot repository is functional. See
7+
Checks for common misconfigurations in a snapshot repository. See
88
<<snapshots-repository-verification>>.
99

1010
////

docs/reference/snapshot-restore/repository-shared-file-system.asciidoc

Lines changed: 47 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,14 @@
33

44
include::{es-repo-dir}/snapshot-restore/on-prem-repo-type.asciidoc[]
55

6-
Use a shared file system repository to store snapshots on a
7-
shared file system.
6+
Use a shared file system repository to store snapshots on a shared file system.
87

98
To register a shared file system repository, first mount the file system to the
10-
same location on all master and data nodes. Then add the file system's
11-
path or parent directory to the `path.repo` setting in `elasticsearch.yml` for
12-
each master and data node. For running clusters, this requires a
9+
same location on all master and data nodes. Then add the file system's path or
10+
parent directory to the `path.repo` setting in `elasticsearch.yml` for each
11+
master and data node. For running clusters, this requires a
1312
<<restart-cluster-rolling,rolling restart>> of each node.
1413

15-
IMPORTANT: By default, a network file system (NFS) uses user IDs (UIDs) and
16-
group IDs (GIDs) to match accounts across nodes. If your shared file system is
17-
an NFS and your nodes don't use the same UIDs and GIDs, update your NFS
18-
configuration to account for this.
19-
2014
Supported `path.repo` values vary by platform:
2115

2216
include::{es-repo-dir}/tab-widgets/register-fs-repo-widget.asciidoc[]
@@ -47,3 +41,46 @@ Maximum number of snapshots the repository can contain.
4741
Defaults to `Integer.MAX_VALUE`, which is `2^31-1` or `2147483647`.
4842

4943
include::repository-shared-settings.asciidoc[]
44+
45+
==== Troubleshooting a shared file system repository
46+
47+
{es} interacts with a shared file system repository using the file system
48+
abstraction in your operating system. This means that every {es} node must be
49+
able to perform operations within the repository path such as creating,
50+
opening, and renaming files, and creating and listing directories, and
51+
operations performed by one node must be visible to other nodes as soon as they
52+
complete.
53+
54+
Check for common misconfigurations using the <<verify-snapshot-repo-api>> API
55+
and the <<repo-analysis-api>> API. When the repository is properly configured,
56+
these APIs will complete successfully. If the verify repository or repository
57+
analysis APIs report a problem then you will be able to reproduce this problem
58+
outside {es} by performing similar operations on the file system directly.
59+
60+
If the verify repository or repository analysis APIs fail with an error
61+
indicating insufficient permissions then adjust the configuration of the
62+
repository within your operating system to give {es} an appropriate level of
63+
access. To reproduce such problems directly, perform the same operations as
64+
{es} in the same security context as the one in which {es} is running. For
65+
example, on Linux, use a command such as `su` to switch to the user as which
66+
{es} runs.
67+
68+
If the verify repository or repository analysis APIs fail with an error
69+
indicating that operations on one node are not immediately visible on another
70+
node then adjust the configuration of the repository within your operating
71+
system to address this problem. If your repository cannot be configured with
72+
strong enough visibility guarantees then it is not suitable for use as an {es}
73+
snapshot repository.
74+
75+
The verify repository and repository analysis APIs will also fail if the
76+
operating system returns any other kind of I/O error when accessing the
77+
repository. If this happens, address the cause of the I/O error reported by the
78+
operating system.
79+
80+
TIP: Many NFS implementations match accounts across nodes using their _numeric_
81+
user IDs (UIDs) and group IDs (GIDs) rather than their names. It is possible
82+
for {es} to run under an account with the same name (often `elasticsearch`) on
83+
each node, but for these accounts to have different numeric user or group IDs.
84+
If your shared file system uses NFS then ensure that every node is running with
85+
the same numeric UID and GID, or else update your NFS configuration to account
86+
for the variance in numeric IDs across nodes.

0 commit comments

Comments
 (0)