Skip to content

Commit f719f8b

Browse files
(Doc+) Capture Elasticsearch diagnostic (#108259) (#110067)
* (Doc+) Capture Elasticsearch diagnostic * add diagnostic topic to nav, chunk content, style edits * fix test --------- Co-authored-by: shainaraskas <[email protected]> (cherry picked from commit 1a55e2f) Co-authored-by: Stef Nestor <[email protected]>
1 parent 28722d5 commit f719f8b

File tree

2 files changed

+154
-0
lines changed

2 files changed

+154
-0
lines changed

docs/reference/troubleshooting.asciidoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,3 +138,5 @@ include::troubleshooting/troubleshooting-searches.asciidoc[]
138138
include::troubleshooting/troubleshooting-shards-capacity.asciidoc[]
139139

140140
include::troubleshooting/troubleshooting-unbalanced-cluster.asciidoc[]
141+
142+
include::troubleshooting/diagnostic.asciidoc[]
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
[[diagnostic]]
2+
== Capturing diagnostics
3+
++++
4+
<titleabbrev>Capture diagnostics</titleabbrev>
5+
++++
6+
:keywords: Elasticsearch diagnostic, diagnostics
7+
8+
The {es} https://github.com/elastic/support-diagnostics[Support Diagnostic] tool captures a point-in-time snapshot of cluster statistics and most settings.
9+
It works against all {es} versions.
10+
11+
This information can be used to troubleshoot problems with your cluster. For examples of issues that you can troubleshoot using Support Diagnostic tool output, refer to https://www.elastic.co/blog/why-does-elastic-support-keep-asking-for-diagnostic-files[the Elastic blog].
12+
13+
You can generate diagnostic information using this tool before you contact https://support.elastic.co[Elastic Support] or
14+
https://discuss.elastic.co[Elastic Discuss] to minimize turnaround time.
15+
16+
[discrete]
17+
[[diagnostic-tool-requirements]]
18+
=== Requirements
19+
20+
- Java Runtime Environment or Java Development Kit v1.8 or higher
21+
22+
[discrete]
23+
[[diagnostic-tool-access]]
24+
=== Access the tool
25+
26+
The Support Diagnostic tool is included as a sub-library in some Elastic deployments:
27+
28+
* {ece}: Located under **{ece}** > **Deployment** > **Operations** >
29+
**Prepare Bundle** > **{es}**.
30+
* {eck}: Run as https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-take-eck-dump.html[`eck-diagnostics`].
31+
32+
You can also directly download the `diagnostics-X.X.X-dist.zip` file for the latest Support Diagnostic release
33+
from https://github.com/elastic/support-diagnostics/releases/latest[the `support-diagnostic` repo].
34+
35+
36+
[discrete]
37+
[[diagnostic-capture]]
38+
=== Capture diagnostic information
39+
40+
To capture an {es} diagnostic:
41+
42+
. In a terminal, verify that your network and user permissions are sufficient to connect to your {es}
43+
cluster by polling the cluster's <<cluster-health,health>>.
44+
+
45+
For example, with the parameters `host:localhost`, `port:9200`, and `username:elastic`, you'd use the following curl request:
46+
+
47+
[source,sh]
48+
----
49+
curl -X GET -k -u elastic -p https://localhost:9200/_cluster/health
50+
----
51+
// NOTCONSOLE
52+
+
53+
If you receive a an HTTP 200 `OK` response, then you can proceed to the next step. If you receive a different
54+
response code, then <<diagnostic-non-200,diagnose the issue>> before proceeding.
55+
56+
. Using the same environment parameters, run the diagnostic tool script.
57+
+
58+
For information about the parameters that you can pass to the tool, refer to the https://github.com/elastic/support-diagnostics#standard-options[diagnostic
59+
parameter reference].
60+
+
61+
The following command options are recommended:
62+
+
63+
**Unix-based systems**
64+
+
65+
[source,sh]
66+
----
67+
sudo ./diagnostics.sh --type local --host localhost --port 9200 -u elastic -p --bypassDiagVerify --ssl --noVerify
68+
----
69+
+
70+
**Windows**
71+
+
72+
[source,sh]
73+
----
74+
sudo .\diagnostics.bat --type local --host localhost --port 9200 -u elastic -p --bypassDiagVerify --ssl --noVerify
75+
----
76+
+
77+
[TIP]
78+
.Script execution modes
79+
====
80+
You can execute the script in three https://github.com/elastic/support-diagnostics#diagnostic-types[modes]:
81+
82+
* `local` (default, recommended): Polls the <<rest-apis,{es} API>>,
83+
gathers operating system info, and captures cluster and GC logs.
84+
85+
* `remote`: Establishes an ssh session
86+
to the applicable target server to pull the same information as `local`.
87+
88+
* `api`: Polls the <<rest-apis,{es} API>>. All other data must be
89+
collected manually.
90+
====
91+
92+
. When the script has completed, verify that no errors were logged to `diagnostic.log`.
93+
If the log file contains errors, then refer to <<diagnostic-log-errors,Diagnose errors in `diagnostic.log`>>.
94+
95+
. If the script completed without errors, then an archive with the format `<diagnostic type>-diagnostics-<DateTimeStamp>.zip` is created in the working directory, or an output directory you have specified. You can review or share the diagnostic archive as needed.
96+
97+
[discrete]
98+
[[diagnostic-non-200]]
99+
=== Diagnose a non-200 cluster health response
100+
101+
When you poll your cluster health, if you receive any response other than `200 0K`, then the diagnostic tool
102+
might not work as intended. The following are possible error codes and their resolutions:
103+
104+
HTTP 401 `UNAUTHENTICATED`::
105+
Additional information in the error will usually indicate either
106+
that your `username:password` pair is invalid, or that your `.security`
107+
index is unavailable and you need to setup a temporary
108+
<<file-realm,file-based realm>> user with `role:superuser` to authenticate.
109+
110+
HTTP 403 `UNAUTHORIZED`::
111+
Your `username` is recognized but
112+
has insufficient permissions to run the diagnostic. Either use a different
113+
username or elevate the user's privileges.
114+
115+
HTTP 429 `TOO_MANY_REQUESTS` (for example, `circuit_breaking_exception`)::
116+
Your username authenticated and authorized, but the cluster is under
117+
sufficiently high strain that it's not responding to API calls. These
118+
responses are usually intermittent. You can proceed with running the diagnostic,
119+
but the diagnostic results might be incomplete.
120+
121+
HTTP 504 `BAD_GATEWAY`::
122+
Your network is experiencing issues reaching the cluster. You might be using a proxy or firewall.
123+
Consider running the diagnostic tool from a different location, confirming your port, or using an IP
124+
instead of a URL domain.
125+
126+
HTTP 503 `SERVICE_UNAVAILABLE` (for example, `master_not_discovered_exception`)::
127+
Your cluster does not currently have an elected master node, which is
128+
required for it to be API-responsive. This might be temporary while the master
129+
node rotates. If the issue persists, then <<cluster-fault-detection,investigate the cause>>
130+
before proceeding.
131+
132+
[discrete]
133+
[[diagnostic-log-errors]]
134+
=== Diagnose errors in `diagnostic.log`
135+
136+
The following are common errors that you might encounter when running the diagnostic tool:
137+
138+
* `Error: Could not find or load main class com.elastic.support.diagnostics.DiagnosticApp`
139+
+
140+
This indicates that you accidentally downloaded the source code file
141+
instead of `diagnostics-X.X.X-dist.zip` from the releases page.
142+
143+
* `Could not retrieve the Elasticsearch version due to a system or network error - unable to continue.`
144+
+
145+
This indicates that the diagnostic couldn't run commands against the cluster.
146+
Poll the cluster's health again, and ensure that you're using the same parameters
147+
when you run the dianostic batch or shell file.
148+
149+
* A `security_exception` that includes `is unauthorized for user`:
150+
+
151+
The provided user has insufficient admin permissions to run the diagnostic tool. Use another
152+
user, or grant the user `role:superuser` privileges.

0 commit comments

Comments
 (0)