|
| 1 | +[[diagnostic]] |
| 2 | +== Capturing diagnostics |
| 3 | +++++ |
| 4 | +<titleabbrev>Capture diagnostics</titleabbrev> |
| 5 | +++++ |
| 6 | +:keywords: Elasticsearch diagnostic, diagnostics |
| 7 | + |
| 8 | +The {es} https://github.com/elastic/support-diagnostics[Support Diagnostic] tool captures a point-in-time snapshot of cluster statistics and most settings. |
| 9 | +It works against all {es} versions. |
| 10 | + |
| 11 | +This information can be used to troubleshoot problems with your cluster. For examples of issues that you can troubleshoot using Support Diagnostic tool output, refer to https://www.elastic.co/blog/why-does-elastic-support-keep-asking-for-diagnostic-files[the Elastic blog]. |
| 12 | + |
| 13 | +You can generate diagnostic information using this tool before you contact https://support.elastic.co[Elastic Support] or |
| 14 | +https://discuss.elastic.co[Elastic Discuss] to minimize turnaround time. |
| 15 | + |
| 16 | +[discrete] |
| 17 | +[[diagnostic-tool-requirements]] |
| 18 | +=== Requirements |
| 19 | + |
| 20 | +- Java Runtime Environment or Java Development Kit v1.8 or higher |
| 21 | + |
| 22 | +[discrete] |
| 23 | +[[diagnostic-tool-access]] |
| 24 | +=== Access the tool |
| 25 | + |
| 26 | +The Support Diagnostic tool is included as a sub-library in some Elastic deployments: |
| 27 | + |
| 28 | +* {ece}: Located under **{ece}** > **Deployment** > **Operations** > |
| 29 | +**Prepare Bundle** > **{es}**. |
| 30 | +* {eck}: Run as https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-take-eck-dump.html[`eck-diagnostics`]. |
| 31 | + |
| 32 | +You can also directly download the `diagnostics-X.X.X-dist.zip` file for the latest Support Diagnostic release |
| 33 | +from https://github.com/elastic/support-diagnostics/releases/latest[the `support-diagnostic` repo]. |
| 34 | + |
| 35 | + |
| 36 | +[discrete] |
| 37 | +[[diagnostic-capture]] |
| 38 | +=== Capture diagnostic information |
| 39 | + |
| 40 | +To capture an {es} diagnostic: |
| 41 | + |
| 42 | +. In a terminal, verify that your network and user permissions are sufficient to connect to your {es} |
| 43 | +cluster by polling the cluster's <<cluster-health,health>>. |
| 44 | ++ |
| 45 | +For example, with the parameters `host:localhost`, `port:9200`, and `username:elastic`, you'd use the following curl request: |
| 46 | ++ |
| 47 | +[source,sh] |
| 48 | +---- |
| 49 | +curl -X GET -k -u elastic -p https://localhost:9200/_cluster/health |
| 50 | +---- |
| 51 | +// NOTCONSOLE |
| 52 | ++ |
| 53 | +If you receive a an HTTP 200 `OK` response, then you can proceed to the next step. If you receive a different |
| 54 | +response code, then <<diagnostic-non-200,diagnose the issue>> before proceeding. |
| 55 | + |
| 56 | +. Using the same environment parameters, run the diagnostic tool script. |
| 57 | ++ |
| 58 | +For information about the parameters that you can pass to the tool, refer to the https://github.com/elastic/support-diagnostics#standard-options[diagnostic |
| 59 | +parameter reference]. |
| 60 | ++ |
| 61 | +The following command options are recommended: |
| 62 | ++ |
| 63 | +**Unix-based systems** |
| 64 | ++ |
| 65 | +[source,sh] |
| 66 | +---- |
| 67 | +sudo ./diagnostics.sh --type local --host localhost --port 9200 -u elastic -p --bypassDiagVerify --ssl --noVerify |
| 68 | +---- |
| 69 | ++ |
| 70 | +**Windows** |
| 71 | ++ |
| 72 | +[source,sh] |
| 73 | +---- |
| 74 | +sudo .\diagnostics.bat --type local --host localhost --port 9200 -u elastic -p --bypassDiagVerify --ssl --noVerify |
| 75 | +---- |
| 76 | ++ |
| 77 | +[TIP] |
| 78 | +.Script execution modes |
| 79 | +==== |
| 80 | +You can execute the script in three https://github.com/elastic/support-diagnostics#diagnostic-types[modes]: |
| 81 | +
|
| 82 | +* `local` (default, recommended): Polls the <<rest-apis,{es} API>>, |
| 83 | +gathers operating system info, and captures cluster and GC logs. |
| 84 | +
|
| 85 | +* `remote`: Establishes an ssh session |
| 86 | +to the applicable target server to pull the same information as `local`. |
| 87 | +
|
| 88 | +* `api`: Polls the <<rest-apis,{es} API>>. All other data must be |
| 89 | +collected manually. |
| 90 | +==== |
| 91 | + |
| 92 | +. When the script has completed, verify that no errors were logged to `diagnostic.log`. |
| 93 | +If the log file contains errors, then refer to <<diagnostic-log-errors,Diagnose errors in `diagnostic.log`>>. |
| 94 | + |
| 95 | +. If the script completed without errors, then an archive with the format `<diagnostic type>-diagnostics-<DateTimeStamp>.zip` is created in the working directory, or an output directory you have specified. You can review or share the diagnostic archive as needed. |
| 96 | + |
| 97 | +[discrete] |
| 98 | +[[diagnostic-non-200]] |
| 99 | +=== Diagnose a non-200 cluster health response |
| 100 | + |
| 101 | +When you poll your cluster health, if you receive any response other than `200 0K`, then the diagnostic tool |
| 102 | +might not work as intended. The following are possible error codes and their resolutions: |
| 103 | + |
| 104 | +HTTP 401 `UNAUTHENTICATED`:: |
| 105 | +Additional information in the error will usually indicate either |
| 106 | +that your `username:password` pair is invalid, or that your `.security` |
| 107 | +index is unavailable and you need to setup a temporary |
| 108 | +<<file-realm,file-based realm>> user with `role:superuser` to authenticate. |
| 109 | + |
| 110 | +HTTP 403 `UNAUTHORIZED`:: |
| 111 | +Your `username` is recognized but |
| 112 | +has insufficient permissions to run the diagnostic. Either use a different |
| 113 | +username or elevate the user's privileges. |
| 114 | + |
| 115 | +HTTP 429 `TOO_MANY_REQUESTS` (for example, `circuit_breaking_exception`):: |
| 116 | +Your username authenticated and authorized, but the cluster is under |
| 117 | +sufficiently high strain that it's not responding to API calls. These |
| 118 | +responses are usually intermittent. You can proceed with running the diagnostic, |
| 119 | +but the diagnostic results might be incomplete. |
| 120 | + |
| 121 | +HTTP 504 `BAD_GATEWAY`:: |
| 122 | +Your network is experiencing issues reaching the cluster. You might be using a proxy or firewall. |
| 123 | +Consider running the diagnostic tool from a different location, confirming your port, or using an IP |
| 124 | +instead of a URL domain. |
| 125 | + |
| 126 | +HTTP 503 `SERVICE_UNAVAILABLE` (for example, `master_not_discovered_exception`):: |
| 127 | +Your cluster does not currently have an elected master node, which is |
| 128 | +required for it to be API-responsive. This might be temporary while the master |
| 129 | +node rotates. If the issue persists, then <<cluster-fault-detection,investigate the cause>> |
| 130 | +before proceeding. |
| 131 | + |
| 132 | +[discrete] |
| 133 | +[[diagnostic-log-errors]] |
| 134 | +=== Diagnose errors in `diagnostic.log` |
| 135 | + |
| 136 | +The following are common errors that you might encounter when running the diagnostic tool: |
| 137 | + |
| 138 | +* `Error: Could not find or load main class com.elastic.support.diagnostics.DiagnosticApp` |
| 139 | ++ |
| 140 | +This indicates that you accidentally downloaded the source code file |
| 141 | +instead of `diagnostics-X.X.X-dist.zip` from the releases page. |
| 142 | + |
| 143 | +* `Could not retrieve the Elasticsearch version due to a system or network error - unable to continue.` |
| 144 | ++ |
| 145 | +This indicates that the diagnostic couldn't run commands against the cluster. |
| 146 | +Poll the cluster's health again, and ensure that you're using the same parameters |
| 147 | +when you run the dianostic batch or shell file. |
| 148 | + |
| 149 | +* A `security_exception` that includes `is unauthorized for user`: |
| 150 | ++ |
| 151 | +The provided user has insufficient admin permissions to run the diagnostic tool. Use another |
| 152 | +user, or grant the user `role:superuser` privileges. |
0 commit comments