diff --git a/raw-migrated-files/docs-content/serverless/observability-synthetics-troubleshooting.md b/raw-migrated-files/docs-content/serverless/observability-synthetics-troubleshooting.md deleted file mode 100644 index 1fff72626b..0000000000 --- a/raw-migrated-files/docs-content/serverless/observability-synthetics-troubleshooting.md +++ /dev/null @@ -1,95 +0,0 @@ ---- -navigation_title: "Troubleshooting" ---- - -# Troubleshooting Synthetics [observability-synthetics-troubleshooting] - - - -## Local debugging [synthetics-troubleshooting-local-debugging] - -For debugging synthetic tests locally, you can set an environment variable, `DEBUG=synthetics`, to capture Synthetics agent logs when using the [Synthetics CLI](observability-synthetics-command-reference.md). - - -## Common issues [synthetics-troubleshooting-common-issues] - - -### No results from a monitor configured to run on a {{private-location}} [synthetics-troubleshooting-no-agent-running] - -If you have created a {{private-location}} and configured a monitor to run on that {{private-location}}, but don’t see any results for that monitor in the Synthetics UI, make sure there is an agent configured to run against the agent policy. - -::::{note} -If you attempt to assign an agent policy to a {{private-location}} *before* configuring an agent to run against the agent policy, you will see a note in the Synthetics UI that the selected agent policy has no agents. - -:::: - - -When creating a {{private-location}}, you have to: - -1. [Set up {{agent}}](observability-synthetics-private-location.md#synthetics-private-location-fleet-agent). -2. [Connect {{fleet}} to your Observability project](observability-synthetics-private-location.md#synthetics-private-location-connect) and enroll an {{agent}} in {{fleet}}. -3. [Add a {{private-location}}](observability-synthetics-private-location.md#synthetics-private-location-add) in the Synthetics UI. - -If you do not complete the second item, no agents will be configured to run against the agent policy, and any monitors configured to run on that {{private-location}} won’t be able to run so there will be no results in the Synthetics UI. - -To fix this, make sure there is an agent configured to run against the agent policy. - - -### No results from a monitor [synthetics-troubleshooting-no-direct-es-connection] - -If you have configured a monitor but don’t see any results for that monitor in the Synthetics UI, whether running them from Elastic’s global managed testing infrastructure or from {{private-location}}s, ensure Synthetics has a direct connection to {{es}}. - -Do not configure any ingest pipelines or output via Logstash as this will prevent Synthetics from working properly and is not supported. - - -### Browser monitor configured to run on a {{private-location}} not running to schedule [synthetics-troubleshooting-missing-browser-schedules] - -If you have browser monitors configured to run on a {{private-location}} but notice one or more of them are not running as scheduled, this could be because: - -* The time it takes for your monitor to run is longer than the frequency you have set -* There may be too many monitors trying to run concurrently, causing some of them to skip their scheduled run - -You may also see a message in the logs such as `2 tasks have missed their schedule deadlines by more than 1 second in the last 15s`. These will be visible from inside the Agent diagnostic ZIP file, and the numbers and time periods may be different in your logs. - -Start by identifying the cause of the issue. First, check if the time it takes the monitor to run is less than the scheduled frequency: - -1. Go to the Synthetics UI. -2. Click the monitor, then click **Go to monitor**. -3. Go to the [Overview tab](observability-synthetics-analyze.md#synthetics-analyze-overview) to see the *Avg. duration*. You can also view the duration for individual runs in the [History tab](observability-synthetics-analyze.md#synthetics-analyze-individual-monitors-history). -4. Compare the duration to the scheduled frequency. If the duration is *greater than* the scheduled frequency, for example if the monitor that takes 90 seconds to run and its scheduled frequency is 1 minute, the next scheduled run will not occur because the current one is still running so you may see results for every other scheduled run. - - To fix this, you can either: - - * Change the frequency so the monitor runs less often. - * Refactor the monitor so it can run in a shorter amount of time. - - -If the duration is *less than* the scheduled frequency or the suggestion above does not fix the issue, then there may be too many browser monitors attempting to run on the {{private-location}}. Due to the additional hardware overhead of running browser monitors, we limit each {{private-location}} to only run two browser monitors at the same time. Depending on how many browser monitors you have configured to run on the {{private-location}} and their schedule, the {{private-location}} may not be able to run them all because it would require more than two browser tests to be running simultaneously. - -To fix this issue, you can either: - -* Increase the number of concurrent browser monitors allowed (as described in [Scaling Private Locations](observability-synthetics-private-location.md#synthetics-private-location-scaling)), paying attention to the scaling and hardware requirements documented. -* Create multiple {{private-location}}s and spread your browser monitors across them more evenly (effectively horizontally scaling your {{private-location}}s). - - -### No locations are available [synthetics-troubleshooting-no-locations] - -When using {{ecloud}}, if there are no options available in the *Locations* dropdown when you try to create a monitor in the Synthetics UI *or* if no locations are listed when using the [`location` command](observability-synthetics-command-reference.md#elastic-synthetics-locations-command), it might be because you do not have permission to use Elastic managed locations *and* there are no [Private Locations](observability-synthetics-private-location.md#monitor-via-private-agent) available yet. - -There are a few ways to fix this: - -* If you have [Editor](observability-synthetics-feature-roles.md) access, you can [create a new Private Location](observability-synthetics-private-location.md#monitor-via-private-agent). Then try creating the monitor again. -* If you do *not* have the right privileges to create a Private Location, you can ask an [Admin](observability-synthetics-feature-roles.md) to create a Private Location or give you the necessary privileges so you can [create a new Private Location](observability-synthetics-private-location.md#monitor-via-private-agent). Then try creating the monitor again. - - -## Get help [synthetics-troubleshooting-get-help] - - -### Elastic Support [synthetics-troubleshooting-support] - -We offer a support experience unlike any other. Our team of professionals *speak human and code* and love making your day. [Learn more about subscriptions](https://www.elastic.co/subscriptions). - - -### Discussion forum [synthetics-troubleshooting-discussion] - -For other questions and feature requests, visit our [discussion forum](https://discuss.elastic.co//c/observability/synthetics/75). diff --git a/raw-migrated-files/docs-content/serverless/observability-troubleshoot-logs.md b/raw-migrated-files/docs-content/serverless/observability-troubleshoot-logs.md deleted file mode 100644 index 814007c939..0000000000 --- a/raw-migrated-files/docs-content/serverless/observability-troubleshoot-logs.md +++ /dev/null @@ -1,197 +0,0 @@ -# Troubleshoot logs [observability-troubleshoot-logs] - -Use this page to find possible solutions for errors your encountering with your logs. This troubleshooting page is broken into the following sections: - -* [Common onboarding issues](observability-troubleshoot-logs.md#logs-onboarding-troubleshooting) -* [Mapping and pipeline issues](observability-troubleshoot-logs.md#logs-common-mapping-troubleshooting) - - -## Common onboarding issues [logs-onboarding-troubleshooting] - -This section provides possible solutions for errors you might encounter while onboarding your logs. - - -### User does not have permissions to create API key [observability-troubleshoot-logs-user-does-not-have-permissions-to-create-api-key] - -When adding a new data using the guided instructions in your project (**Add data** → **Collect and analyze logs** → **Stream log files**), if you don’t have the required privileges to create an API key, you’ll see the following error message: - -::::{note} -You need permission to manage API keys - -:::: - - - -#### Solution [observability-troubleshoot-logs-solution] - -You need to either: - -* Ask an administrator to update your user role to at least **Developer** by going to the user icon on the header bar and opening **Organization** → **Members**. Read more about user roles in [Assign user roles and privileges](general-manage-organization.md#general-assign-user-roles). After your use role is updated, restart the onboarding flow. -* Get an API key from an administrator and manually add the API to the {{agent}} configuration. See [Configure the {{agent}}](observability-stream-log-files.md#observability-stream-log-files-step-3-configure-the-agent) for more on manually updating the configuration and adding the API key. - - -### Observability project not accessible from host [observability-troubleshoot-logs-observability-project-not-accessible-from-host] - -If your Observability project is not accessible from the host, you’ll see the following error message after pasting the **Install the {{agent}}** instructions into the host: - -```plaintext -Failed to connect to {host} port {port} after 0 ms: Connection refused -``` - - -#### Solution [observability-troubleshoot-logs-solution-1] - -The host needs access to your project. Port `443` must be open and the project’s {{es}} endpoint must be reachable. You can locate your project’s endpoint by clicking the help icon (![Help icon](../../../images/serverless-help.svg "")) and selecting **Endpoints**. Run the following command, replacing the URL with your endpoint, and you should get an authentication error with more details on resolving your issue: - -```shell -curl https://your-endpoint.elastic.cloud -``` - - -### Download {{agent}} failed [observability-troubleshoot-logs-download-agent-failed] - -If the host was able to download the installation script but cannot connect to the public artifact repository, you’ll see the following error message: - -```plaintext -Download Elastic Agent - -Failed to download Elastic Agent, see script for error. -``` - - -#### Solutions [observability-troubleshoot-logs-solutions] - -* If the combination of the {{agent}} version and operating system architecture is not available, you’ll see the following error message: - - ```plaintext - The requested URL returned error: 404 - ``` - - To fix this, update the {{agent}} version in the installation instructions to a known version of the {{agent}}. - -* If the {{agent}} was fully downloaded previously, you’ll see the following error message: - - ```plaintext - Error: cannot perform installation as Elastic Agent is already running from this directory - ``` - - To fix this, delete previous downloads and restart the onboarding. - -* You’re an Elastic Cloud Enterprise user without access to the Elastic downloads page. - - -### Install {{agent}} failed [observability-troubleshoot-logs-install-agent-failed] - -If an {{agent}} already exists on your host, you’ll see the following error message: - -```plaintext -Install Elastic Agent - -Failed to install Elastic Agent, see script for error. -``` - - -#### Solution [observability-troubleshoot-logs-solution-2] - -You can uninstall the current {{agent}} using the `elastic-agent uninstall` command, and run the script again. - -::::{warning} -Uninstalling the current {{agent}} removes the entire current setup, including the existing configuration. - -:::: - - - -### Waiting for Logs to be shipped…​ step never completes [observability-troubleshoot-logs-waiting-for-logs-to-be-shipped-step-never-completes] - -If the **Waiting for Logs to be shipped…​** step never completes, logs are not being shipped to your Observability project, and there is most likely an issue with your {{agent}} configuration. - - -#### Solution [observability-troubleshoot-logs-solution-3] - -Inspect the {{agent}} logs for errors. See the [Debug standalone {{agent}}s](https://www.elastic.co/guide/en/fleet/current/debug-standalone-agents.html#inspect-standalone-agent-logs) documentation for more on finding errors in {{agent}} logs. - - -## Mapping and pipeline issues [logs-common-mapping-troubleshooting] - -This section provides possible solutions for mapping and pipeline issues you might encounter with your logs. - - -### Keyword fields are too long [logs-mapping-troubleshooting-keyword-limit] - -The `keyword` field limit is 32,766 bytes. When indexing a document, if your `keyword` field length exceeds this limit, you’ll see an error similar to the following: - -```plaintext -max_bytes_length_exceeded_exception: bytes can be at most 32766 in length -``` - - -#### Solution [logs-mapping-troubleshooting-keyword-limit-solution] - -Avoid this error using one of the following options: - -**Stop indexing the field:** If you don’t need the `keyword` field for aggregation or search, set `"index":false` in the index template to stop indexing the field. - -**Convert the `keyword` field to a `text` field:** To continue indexing the field while avoiding length limits, you can convert the `keyword` field to a `text` field. - -::::{note} -Aggregations on this field would no longer be supported, but the contents would be searchable. -:::: - - -To convert the `keyword` field to a `text` field: - -1. Create a new index with the `text` field data type. -2. Reindex from the `_source` field of the source index using the [`_reindex` API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-reindex). - - -### Date format mismatch [logs-mapping-troubleshooting-date-mismatch] - -If the format of the `date` field in your document doesn’t match the format set in your index template, you’ll see an error similar to the following: - -```plaintext -failed to parse field [date] of type [date] in document with id 'KGcZb3cBqhj6kAxank_x'. -``` - - -#### Solution [logs-mapping-troubleshooting-date-solution] - -Add the format of the mismatched date to your index template. Multiple formats can be specified by separating them with `||` as a separator. Each format will be tried in turn until a matching format is found. For example: - -$$$date-format-example$$$ - -```console -PUT my-index-000001 -{ - "mappings": { - "properties": { - "date": { - "type": "date", - "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" - } - } - } -} -``` - -Refer to the [`date` field type](https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html) docs for more information. - - -### Grok or dissect pattern mismatch [logs-mapping-troubleshooting-grok-mismatch] - -If the pattern in your grok or dissect processor doesn’t match the format of your document, you’ll see an error similar to the following: - -```plaintext -Provided Grok patterns do not match field value... -``` - - -#### Solution [logs-mapping-troubleshooting-grok-solution] - -Make sure your [grok](https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html) or [dissect](https://www.elastic.co/guide/en/elasticsearch/reference/current/dissect-processor.html) processor pattern matches your log document format. - -You can build and debug grok patterns in {{kib}} using the [Grok Debugger](../../../explore-analyze/query-filter/tools/grok-debugger.md). Find the **Grok Debugger** by navigating to the **Developer tools** page using the navigation menu or the global search field. - -From here, you can enter sample data representative of the log document you’re trying to ingest and the Grok pattern you want to apply to the data. - -If you don’t see any **Structured Data** when you simulate the grok pattern, iterate on the pattern until you find the error. diff --git a/raw-migrated-files/docs-content/serverless/slo-troubleshoot-slos.md b/raw-migrated-files/docs-content/serverless/slo-troubleshoot-slos.md deleted file mode 100644 index 0870ecc1cf..0000000000 --- a/raw-migrated-files/docs-content/serverless/slo-troubleshoot-slos.md +++ /dev/null @@ -1,180 +0,0 @@ ---- -navigation_title: "Troubleshoot SLOs" ---- - -# Troubleshoot service-level objectives (SLOs) [slo-troubleshoot-slos] - - -::::{warning} -Do not edit, delete, or tamper with any "internal" assets mentioned in this document, such as the transforms or ingest pipelines created by the SLO application. - -Do not attempt to edit the `.slo-observability.*` indices mentioned in this document by overriding index templates or editing the settings/mappings. - -The implementation details described here are subject to change. - -:::: - - -This document provides an overview of common issues encountered when working with service-level objectives (SLOs). It explores the relationships between SLOs and other core functionalities within the stack, such as [transforms](../../../explore-analyze/transforms.md) and [ingest pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md), highlighting how these integrations can impact the functionality of SLOs. - -* [Understanding SLO internals](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-understanding-slos) -* [Common problems](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-common-problems) -* [SLO troubleshooting actions](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-actions) - - -## Understanding SLO internals [slo-understanding-slos] - -::::{tip} -If you’re already familiar with how SLOs work and their relationship with other system components, such as transforms and ingest pipelines, you can jump directly to [Common problems](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-common-problems). - -:::: - - -An SLO is represented by several system resources: - -* **SLO Definition**: Stored as a Kibana Saved Object. -* **Transforms**: For each SLO, {{kib}} creates two transforms: - - * **Rolling-up transform**: `slo-{slo.id}-{slo.revision}`, rolls up the data into a smaller set of documents. The source indices of this transform are defined by the SLO. The target index will be `.slo-observability.sli-v{slo.internal-version}-{monthly date}`. - * **Rolling-up ingest pipeline**: `slo-observability.sli.pipeline-{slo.id}-{slo.revision}`, used by the rolling-up transform. - * **Summarizing transform**: `slo-summary-{slo.id}-{slo.revision}`, updates the latest values, such as the observed SLI or remaining error budget, for efficient searching and filtering of SLOs. The source of this transform is `.slo-observability.sli-v{slo.internal-version}*`. The target index is `.slo-observability.summary-v{slo.internal-version}`. - * **Summarizing ingest pipeline**: `slo-observability.summary.pipeline-{slo.id}-{slo.revision}`, used by the summarizing transform. - -* **Additional resources**: {{kib}} also installs and manages shared resources to the SLOs, including index templates, indices, and ingest pipelines, among others. - -When an **SLO update** changes any of the `SLI parameters`, the `SLO objective`, or the `time window`, a revision bump (`{slo.revision}`) and a full reinstallation of the associated assets (transforms and ingest pipelines) occur. In addition, the revision bump deletes any previously aggregated data for that SLO. Updates to fields like `name`, `description`, or `tags` do not trigger a revision bump or asset reinstallation. - -Ensuring that transforms are functioning correctly and that the cluster is healthy is crucial for maintaining accurate and reliable SLOs. - - -## Common problems [slo-common-problems] - -It’s common for SLO problems to arise when there are underlying problems in the cluster, such as unavailable shards or failed transforms. Because SLOs rely on transforms to aggregate and process data, any failure or misconfiguration in these components can lead to inaccurate or incomplete SLO calculations. Additionally, unavailable shards can affect the data retrieval process, further complicating the reliability of SLO metrics. - - -### No transform or ingest nodes [slo-no-transform-ingest-node] - -Because SLOs depend on both [ingest pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) and [transforms](../../../explore-analyze/transforms.md) to process the data, it’s essential to ensure that the cluster has nodes with the appropriate [roles](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#node-roles). - -Ensure the cluster includes one or more nodes with both `ingest` and `transform` roles to support the data processing and transformations required for SLOs to function properly. The roles can exist on the same node or be distributed across separate nodes. - - -### Unhealthy or missing transforms [slo-transform-unhealthy] - -When working with SLOs, it is crucial to ensure that the associated transforms function correctly. Transforms are responsible for generating the data needed for SLOs, and two transforms are created for each SLO. If you notice that your SLOs are not displaying the expected data, it’s time to check the health of these associated transforms. - -{{kib}} shows the following message when any of the associated transforms is in an unexpected state: - -* `"The following transform is an unhealthy state"`, followed by a list of transforms. - -For detailed guidance on diagnosing and resolving transform-related issues, refer to the [troubleshooting transforms](../../../troubleshoot/elasticsearch/transform-troubleshooting.md) documentation . - -It’s also recommended that you perform the following transform checks: - -* Ensure the transforms needed for the SLOs haven’t been deleted or stopped. - - If a transform has been deleted, the easiest way to recreate it is using the [Reset SLO](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-reset) action, forcing the recreation of the transforms. If a transform was stopped, try to start it, and then check the `health tab` of the transform. - -* [Inspect SLO assets](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-inspect) to analyze the SLO definition and all associated resources. - - Use the direct links offered by the **Inspect UI** and check that all referenced resources exist, as that’s not verified by the inspect functionality. - - Use the `query composite` content to verify if the queries performed by the transforms are valid and return the expected data. - -* Check the source data and queries of the SLO. - - The most common cause of legitimate transform failures is issues with the source data, such as timestamp parsing errors or incorrect query structures. The following is an example of an unparsable timestamp causing a transform to fail: - - ```bash - "reason": """Failed to index documents into destination index due to permanent error: - [org.elasticsearch.xpack.transform.transforms.BulkIndexingException: Bulk index experienced [500] failures and at least 1 irrecoverable - [unable to parse date [1702842480000]]. Other failures: - [IngestProcessorException] message [org.elasticsearch.ingest.IngestProcessorException: - java.lang.IllegalArgumentException: unable to parse date [1702842480000]]; java.lang.IllegalArgumentException: unable to parse date [1702842480000]]""", - "issue": "Transform task state is [failed]" - ``` - -* As a last resort, consider [resetting the SLO](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-reset). - - -### Missing Ingest Pipelines [slo-missing-pipeline] - -If any of the needed ingest pipelines are missing, try the [Reset SLO](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-reset) action. - - -### Stack-related problems [slo-missing-template] - -As mentioned, maintaining a healthy cluster is crucial for SLOs to function correctly. The following examples show issues **unrelated to SLOs** that can still disrupt their proper operation. While troubleshooting these issues is outside the scope of this document, they are included for illustrative purposes. - -* Problems accessing the source data, causing the transform to fail: - - ```bash - Failed to execute phase [can_match], start; org.elasticsearch.action.search.SearchPhaseExecutionException: - Search rejected due to missing shards [[index_name_1][1], [index_name_2][1], [index_name_3][1]]. - ``` - -* Remote cluster not available, if for example an SLO is fetching data from a remote cluster called `remote-metrics`: - - ```sh - Validation Failed: 1: no such remote cluster: [remote-metrics] - ``` - -* [Circuit breaker exceptions](../../../troubleshoot/elasticsearch/circuit-breaker-errors.md) due to nodes being under memory pressure. - - -## SLO troubleshooting actions [slo-troubleshoot-actions] - - -### Inspect SLO assets [slo-troubleshoot-inspect] - -To be able to inspect SLOs you have to activate the corresponding feature in {{kib}}: - -1. Open **Advanced Settings**, by finding **Stack Management** in the main menu or using the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). -2. Enable `observability:enableInspectEsQueries` setting. - -Afterwards visit the **SLO edit page** and click **SLO Inspect**. - -The **SLO Inspect** option provides a detailed report of an SLO, including: - -* SLO configuration -* Rollup transform configuration -* Summary transform configuration -* Rollup ingest pipeline -* Summary ingest pipeline -* Temporary document -* Rollup transform query composite -* Summary transform query composite - -These resources are very useful for tasks such as trying out the queries performed by the transforms and checking the IDs of all associated resources. The view also includes direct links to transforms and ingest pipelines sections in {{kib}}. - - -### Reset SLO [slo-troubleshoot-reset] - -Resetting an SLO forces the deletion of all SLI data, summary data, and transforms, and then reinstalls and processes the data. Essentially, it recreates the SLO as if it had been deleted and re-created by the user. - -::::{note} -While resetting an SLO can help resolve certain issues, it may not always address the root cause of errors. Most errors related to transforms typically arise from improperly structured source data, such as unparsable timestamps, which prevent the transform from progressing. Additionally, incorrect formatted SLO queries, and consequently transform queries, can also lead to failures. - -Before resetting the SLO, verify that the source data and queries are correctly formatted and validated. Resetting should only be used as a last resort when all other troubleshooting steps have been exhausted. - -:::: - - -Follow these steps to reset an SLO: - -1. Find **SLOs** in the main menu or use the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). -2. Click on the SLO to reset. -3. Select **Actions** → **Reset**. - -Alternatively you can use {{kib}} API for the reset action: - -```console -POST kbn:/api/observability/slos/{sloId}∫/_reset -``` - -Where `sloId` can be obtained from the [Inspect SLO assets](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-inspect) action. - - -### Using API calls to retrieve SLO details [slo-api-calls] - -Refer to [SLO API calls](https://www.elastic.co/docs/api/doc/kibana/v8/operation/operation-findslosop) as an alternative to [using SLO Inspect](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-inspect). diff --git a/raw-migrated-files/observability-docs/observability/logs-troubleshooting.md b/raw-migrated-files/observability-docs/observability/logs-troubleshooting.md deleted file mode 100644 index 898f92aaab..0000000000 --- a/raw-migrated-files/observability-docs/observability/logs-troubleshooting.md +++ /dev/null @@ -1,218 +0,0 @@ -# Troubleshoot logs [logs-troubleshooting] - -Use this page to find possible solutions for errors your encountering with your logs. This troubleshooting page is divided into the following sections: - -* [Common onboarding issues](../../../troubleshoot/observability/troubleshoot-logs.md#logs-onboarding-troubleshooting) -* [Mapping and pipeline issues](../../../troubleshoot/observability/troubleshoot-logs.md#logs-common-mapping-troubleshooting) - - -## Common onboarding issues [logs-onboarding-troubleshooting] - -This section provides possible solutions for errors you might encounter while onboarding your logs. - - -### User does not have permissions to create API key [logs-troubleshooting-insufficient-priv] - -If you don’t have the required privileges to create an API key, you’ll see the following error message: - -```plaintext -User does not have permissions to create API key. - -Required cluster privileges are [`monitor`, `manage_own_api_key`] and -required index privileges are [`auto_configure`, `create_doc`] for -indices [`logs-*-*`, `metrics-*-*`], please add all required privileges -to the role of the authenticated user. -``` - - -#### Solution [logs-troubleshooting-insufficient-priv-solution] - -You need to either: - -* Have an administrator give you the `monitor` and `manage_own_api_key` cluster privileges and the `auto_configure` and `create_doc` indices privileges. Once you have these privileges, restart the onboarding flow. -* Get an API key from an administrator and manually add the API to the {{agent}} configuration. See [Configure the {{agent}}](../../../solutions/observability/logs/stream-any-log-file.md#logs-stream-agent-config) for more on manually updating the configuration and adding the API key. - - -### Failed to create API key [logs-troubleshooting-API-key-failed] - -If you don’t have the privileges to create `savedObjects` in {{kib}}, you’ll see the following error message: - -```plaintext -Failed to create API key - -Something went wrong: Unable to create observability-onboarding-state -``` - - -#### Solution [logs-troubleshooting-API-key-failed-solution] - -You need an administrator to give you the `Saved Objects Management` {{kib}} privilege to generate the required `observability-onboarding-state` flow state. Once you have the necessary privileges, restart the onboarding flow. - - -### {{kib}} not accessible from host [logs-troubleshooting-kib-not-accessible] - -If {{kib}} is not accessible from the host, you’ll see the following error message after pasting the **Install the {{agent}}** instructions into the host: - -```plaintext -Failed to connect to {host} port {port} after 0 ms: Connection refused -``` - - -#### Solution [logs-troubleshooting-kib-not-accessible-solution] - -The host needs access to {{kib}}. Port `443` must be open and the deployment’s {{es}} endpoint must be reachable. Locate your project’s endpoint from **Help menu (![help icon](../../../images/observability-help-icon.png "")) → Connection details**. - -Run the following command, replacing the URL with your endpoint, and you should get an authentication error with more details on resolving your issue: - -```shell -curl https://your-endpoint.elastic.cloud -``` - - -### Download {{agent}} failed [logs-troubleshooting-download-agent] - -If the host was able to download the installation script but cannot connect to the public artifact repository, you’ll see the following error message: - -```plaintext -Download Elastic Agent - -Failed to download Elastic Agent, see script for error. -``` - - -#### Solutions [logs-troubleshooting-download-agent-solution] - -* If the combination of the {{agent}} version and operating system architecture is not available, you’ll see the following error message: - - ```plaintext - The requested URL returned error: 404 - ``` - - To fix this, update the {{agent}} version in the installation instructions to a known version of the {{agent}}. - -* If the {{agent}} was fully downloaded previously, you’ll see the following error message: - - ```plaintext - Error: cannot perform installation as Elastic Agent is already running from this directory - ``` - - To fix this, delete previous downloads and restart the onboarding. - -* You’re an Elastic Cloud Enterprise user without access to the Elastic downloads page. - - -### Install {{agent}} failed [logs-troubleshooting-install-agent] - -If an {{agent}} already exists on your host, you’ll see the following error message: - -```plaintext -Install Elastic Agent - -Failed to install Elastic Agent, see script for error. -``` - - -#### Solution [logs-troubleshooting-install-agent-solution] - -You can uninstall the current {{agent}} using the `elastic-agent uninstall` command, and run the script again. - -::::{warning} -Uninstalling the current {{agent}} removes the entire current setup, including the existing configuration. -:::: - - - -### Waiting for Logs to be shipped…​ step never completes [logs-troubleshooting-wait-for-logs] - -If the **Waiting for Logs to be shipped…​** step never completes, logs are not being shipped to {{es}}, and there is most likely an issue with your {{agent}} configuration. - - -#### Solution [logs-troubleshooting-wait-for-logs-solution] - -Inspect the {{agent}} logs for errors. See the [Debug standalone {{agent}}s](https://www.elastic.co/guide/en/fleet/current/debug-standalone-agents.html#inspect-standalone-agent-logs) documentation for more on finding errors in {{agent}} logs. - - -## Mapping and pipeline issues [logs-common-mapping-troubleshooting] - -This section provides possible solutions for mapping and pipeline issues you might encounter with your logs. - - -### Keyword fields are too long [logs-mapping-troubleshooting-keyword-limit] - -The `keyword` field limit is 32,766 bytes. When indexing a document, if your `keyword` field length exceeds this limit, you’ll see an error similar to the following: - -```plaintext -max_bytes_length_exceeded_exception: bytes can be at most 32766 in length -``` - - -#### Solution [logs-mapping-troubleshooting-keyword-limit-solution] - -Avoid this error using one of the following options: - -**Stop indexing the field:** If you don’t need the `keyword` field for aggregation or search, set `"index":false` in the index template to stop indexing the field. - -**Convert the `keyword` field to a `text` field:** To continue indexing the field while avoiding length limits, you can convert the `keyword` field to a `text` field. - -::::{note} -Aggregations on this field would no longer be supported, but the contents would be searchable. -:::: - - -To convert the `keyword` field to a `text` field: - -1. Create a new index with the `text` field data type. -2. Reindex from the `_source` field of the source index using the [`_reindex` API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-reindex). - - -### Date format mismatch [logs-mapping-troubleshooting-date-mismatch] - -If the format of the `date` field in your document doesn’t match the format set in your index template, you’ll see an error similar to the following: - -```plaintext -failed to parse field [date] of type [date] in document with id 'KGcZb3cBqhj6kAxank_x'. -``` - - -#### Solution [logs-mapping-troubleshooting-date-solution] - -Add the format of the mismatched date to your index template. Multiple formats can be specified by separating them with `||` as a separator. Each format will be tried in turn until a matching format is found. For example: - -$$$date-format-example$$$ - -```console -PUT my-index-000001 -{ - "mappings": { - "properties": { - "date": { - "type": "date", - "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" - } - } - } -} -``` - -Refer to the [`date` field type](https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html) docs for more information. - - -### Grok or dissect pattern mismatch [logs-mapping-troubleshooting-grok-mismatch] - -If the pattern in your grok or dissect processor doesn’t match the format of your document, you’ll see an error similar to the following: - -```plaintext -Provided Grok patterns do not match field value... -``` - - -#### Solution [logs-mapping-troubleshooting-grok-solution] - -Make sure your [grok](https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html) or [dissect](https://www.elastic.co/guide/en/elasticsearch/reference/current/dissect-processor.html) processor pattern matches your log document format. - -You can build and debug grok patterns in {{kib}} using the [Grok Debugger](../../../explore-analyze/query-filter/tools/grok-debugger.md). Find the **Grok Debugger** by navigating to the **Developer tools** page using the navigation menu or the global search field. - -From here, you can enter sample data representative of the log document you’re trying to ingest and the Grok pattern you want to apply to the data. - -If you don’t see any **Structured Data** when you simulate the grok pattern, iterate on the pattern until you find the error. - diff --git a/raw-migrated-files/observability-docs/observability/slo-troubleshoot-slos.md b/raw-migrated-files/observability-docs/observability/slo-troubleshoot-slos.md deleted file mode 100644 index ed9cb9d1b2..0000000000 --- a/raw-migrated-files/observability-docs/observability/slo-troubleshoot-slos.md +++ /dev/null @@ -1,238 +0,0 @@ ---- -navigation_title: "Troubleshoot SLOs" ---- - -# Troubleshoot service-level objectives (SLOs) [slo-troubleshoot-slos] - - -::::{important} -To create and manage SLOs, you need an [appropriate license](https://www.elastic.co/subscriptions), an {{es}} cluster with both `transform` and `ingest` [node roles](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#node-roles) present, and [SLO access](../../../solutions/observability/incident-management/configure-service-level-objective-slo-access.md) must be configured. - -:::: - - -::::{warning} -Do not edit, delete, or tamper with any "internal" assets mentioned in this document, such as the transforms or ingest pipelines created by the SLO application. - -Do not attempt to edit the `.slo-observability.*` indices mentioned in this document by overriding index templates or editing the settings/mappings. - -The implementation details described here are subject to change. - -:::: - - -This document provides an overview of common issues encountered when working with service-level objectives (SLOs). It explores the relationships between SLOs and other core functionalities within the stack, such as [transforms](../../../explore-analyze/transforms.md) and [ingest pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md), highlighting how these integrations can impact the functionality of SLOs. - -* [Understanding SLO internals](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-understanding-slos) -* [Common problems](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-common-problems) -* [SLO troubleshooting actions](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-actions) -* [Upgrade from beta to GA](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-beta) - - -## Understanding SLO internals [slo-understanding-slos] - -::::{tip} -If you’re already familiar with how SLOs work and their relationship with other system components, such as transforms and ingest pipelines, you can jump directly to [Common problems](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-common-problems). - -:::: - - -An SLO is represented by several system resources: - -* **SLO Definition**: Stored as a Kibana Saved Object. -* **Transforms**: For each SLO, {{kib}} creates two transforms: - - * **Rolling-up transform**: `slo-{slo.id}-{slo.revision}`, rolls up the data into a smaller set of documents. The source indices of this transform are defined by the SLO. The target index will be `.slo-observability.sli-v{slo.internal-version}-{monthly date}`. - * **Rolling-up ingest pipeline**: `slo-observability.sli.pipeline-{slo.id}-{slo.revision}`, used by the rolling-up transform. - * **Summarizing transform**: `slo-summary-{slo.id}-{slo.revision}`, updates the latest values, such as the observed SLI or remaining error budget, for efficient searching and filtering of SLOs. The source of this transform is `.slo-observability.sli-v{slo.internal-version}*`. The target index is `.slo-observability.summary-v{slo.internal-version}`. - * **Summarizing ingest pipeline**: `slo-observability.summary.pipeline-{slo.id}-{slo.revision}`, used by the summarizing transform. - -* **Additional resources**: {{kib}} also installs and manages shared resources to the SLOs, including index templates, indices, and ingest pipelines, among others. - -When an **SLO update** changes any of the `SLI parameters`, the `SLO objective`, or the `time window`, a revision bump (`{slo.revision}`) and a full reinstallation of the associated assets (transforms and ingest pipelines) occur. In addition, the revision bump deletes any previously aggregated data for that SLO. Updates to fields like `name`, `description`, or `tags` do not trigger a revision bump or asset reinstallation. - -Ensuring that transforms are functioning correctly and that the cluster is healthy is crucial for maintaining accurate and reliable SLOs. - - -## Common problems [slo-common-problems] - -It’s common for SLO problems to arise when there are underlying problems in the cluster, such as unavailable shards or failed transforms. Because SLOs rely on transforms to aggregate and process data, any failure or misconfiguration in these components can lead to inaccurate or incomplete SLO calculations. Additionally, unavailable shards can affect the data retrieval process, further complicating the reliability of SLO metrics. - - -### No transform or ingest nodes [slo-no-transform-ingest-node] - -Because SLOs depend on both [ingest pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md) and [transforms](../../../explore-analyze/transforms.md) to process the data, it’s essential to ensure that the cluster has nodes with the appropriate [roles](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#node-roles). - -Ensure the cluster includes one or more nodes with both `ingest` and `transform` roles to support the data processing and transformations required for SLOs to function properly. The roles can exist on the same node or be distributed across separate nodes. - - -### Unhealthy or missing transforms [slo-transform-unhealthy] - -When working with SLOs, it is crucial to ensure that the associated transforms function correctly. Transforms are responsible for generating the data needed for SLOs, and two transforms are created for each SLO. If you notice that your SLOs are not displaying the expected data, it’s time to check the health of these associated transforms. - -{{kib}} shows the following message when any of the associated transforms is in an unexpected state: - -* `"The following transform is an unhealthy state"`, followed by a list of transforms. - -For detailed guidance on diagnosing and resolving transform-related issues, refer to the [troubleshooting transforms](../../../troubleshoot/elasticsearch/transform-troubleshooting.md) documentation . - -It’s also recommended that you perform the following transform checks: - -* Ensure the transforms needed for the SLOs haven’t been deleted or stopped. - - If a transform has been deleted, the easiest way to recreate it is using the [Reset SLO](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-reset) action, forcing the recreation of the transforms. If a transform was stopped, try to start it, and then check the `health tab` of the transform. - -* [Inspect SLO assets](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-inspect) to analyze the SLO definition and all associated resources. - - Use the direct links offered by the **Inspect UI** and check that all referenced resources exist, as that’s not verified by the inspect functionality. - - Use the `query composite` content to verify if the queries performed by the transforms are valid and return the expected data. - -* Check the source data and queries of the SLO. - - The most common cause of legitimate transform failures is issues with the source data, such as timestamp parsing errors or incorrect query structures. The following is an example of an unparsable timestamp causing a transform to fail: - - ```bash - "reason": """Failed to index documents into destination index due to permanent error: - [org.elasticsearch.xpack.transform.transforms.BulkIndexingException: Bulk index experienced [500] failures and at least 1 irrecoverable - [unable to parse date [1702842480000]]. Other failures: - [IngestProcessorException] message [org.elasticsearch.ingest.IngestProcessorException: - java.lang.IllegalArgumentException: unable to parse date [1702842480000]]; java.lang.IllegalArgumentException: unable to parse date [1702842480000]]""", - "issue": "Transform task state is [failed]" - ``` - -* As a last resort, consider [resetting the SLO](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-reset). - - -### Missing Ingest Pipelines [slo-missing-pipeline] - -If any of the needed ingest pipelines are missing, try the [Reset SLO](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-reset) action. - - -### Stack-related problems [slo-missing-template] - -As mentioned, maintaining a healthy cluster is crucial for SLOs to function correctly. The following examples show issues **unrelated to SLOs** that can still disrupt their proper operation. While troubleshooting these issues is outside the scope of this document, they are included for illustrative purposes. - -* Problems accessing the source data, causing the transform to fail: - - ```bash - Failed to execute phase [can_match], start; org.elasticsearch.action.search.SearchPhaseExecutionException: - Search rejected due to missing shards [[index_name_1][1], [index_name_2][1], [index_name_3][1]]. - ``` - -* Remote cluster not available, if for example an SLO is fetching data from a remote cluster called `remote-metrics`: - - ```sh - Validation Failed: 1: no such remote cluster: [remote-metrics] - ``` - -* [Circuit breaker exceptions](../../../troubleshoot/elasticsearch/circuit-breaker-errors.md) due to nodes being under memory pressure. - - -## SLO troubleshooting actions [slo-troubleshoot-actions] - - -### Inspect SLO assets [slo-troubleshoot-inspect] - -To be able to inspect SLOs you have to activate the corresponding feature in {{kib}}: - -1. Open **Advanced Settings**, by finding **Stack Management** in the main menu or using the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). -2. Enable `observability:enableInspectEsQueries` setting. - -Afterwards visit the **SLO edit page** and click **SLO Inspect**. - -The **SLO Inspect** option provides a detailed report of an SLO, including: - -* SLO configuration -* Rollup transform configuration -* Summary transform configuration -* Rollup ingest pipeline -* Summary ingest pipeline -* Temporary document -* Rollup transform query composite -* Summary transform query composite - -These resources are very useful for tasks such as trying out the queries performed by the transforms and checking the IDs of all associated resources. The view also includes direct links to transforms and ingest pipelines sections in {{kib}}. - - -### Reset SLO [slo-troubleshoot-reset] - -Resetting an SLO forces the deletion of all SLI data, summary data, and transforms, and then reinstalls and processes the data. Essentially, it recreates the SLO as if it had been deleted and re-created by the user. - -::::{note} -While resetting an SLO can help resolve certain issues, it may not always address the root cause of errors. Most errors related to transforms typically arise from improperly structured source data, such as unparsable timestamps, which prevent the transform from progressing. Additionally, incorrect formatted SLO queries, and consequently transform queries, can also lead to failures. - -Before resetting the SLO, verify that the source data and queries are correctly formatted and validated. Resetting should only be used as a last resort when all other troubleshooting steps have been exhausted. - -:::: - - -Follow these steps to reset an SLO: - -1. Find **SLOs** in the main menu or use the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). -2. Click on the SLO to reset. -3. Select **Actions** → **Reset**. - -Alternatively you can use {{kib}} API for the reset action: - -```console -POST kbn:/api/observability/slos/{sloId}∫/_reset -``` - -Where `sloId` can be obtained from the [Inspect SLO assets](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-inspect) action. - - -### Using API calls to retrieve SLO details [slo-api-calls] - -Refer to [SLO API calls](https://www.elastic.co/docs/api/doc/kibana/v8/operation/operation-findslosop) as an alternative to [using SLO Inspect](../../../troubleshoot/observability/troubleshoot-service-level-objectives-slos.md#slo-troubleshoot-inspect). - - -## Upgrade from beta to GA [slo-troubleshoot-beta] - -Starting in version 8.12.0, SLOs are generally available (GA). If you’re upgrading from a beta version of SLOs (available in 8.11.0 and earlier), you must migrate your SLO definitions to a new format. Otherwise SLOs won’t show up. - -::::{dropdown} Migrate your SLO definitions -To migrate your SLO definitions, open the SLO overview. A banner will display the number of outdated SLOs detected. For each outdated SLO, click **Reset**. If you no longer need the SLO, select **Delete**. - -If you have a large number of SLO definitions, it is possible to automate this process. To do this, you’ll need to use two Elastic APIs: - -* [SLO Definitions Find API](https://github.com/elastic/kibana/blob/9cb830fe9a021cda1d091effbe3e0cd300220969/x-pack/plugins/observability/docs/openapi/slo/bundled.yaml#L453-L514) (`/api/observability/slos/_definitions`) -* [SLO Reset API](https://www.elastic.co/docs/api/doc/kibana/v8/operation/operation-resetsloop) - -Pass in `includeOutdatedOnly=1` as a query parameter to the Definitions Find API. This will display your outdated SLO definitions. Loop through this list, one by one, calling the Reset API on each outdated SLO definition. The Reset API loads the outdated SLO definition and resets it to the new format required for GA. Once an SLO is reset, it will start to regenerate SLIs and summary data. - -:::: - - -::::{dropdown} Remove legacy summary transforms -After migrating to 8.12 or later, you might have some legacy SLO summary transforms running. You can safely delete the following legacy summary transforms: - -```sh -# Stop all legacy summary transforms -POST _transform/slo-summary-occurrences-30d-rolling/_stop?force=true -POST _transform/slo-summary-occurrences-7d-rolling/_stop?force=true -POST _transform/slo-summary-occurrences-90d-rolling/_stop?force=true -POST _transform/slo-summary-occurrences-monthly-aligned/_stop?force=true -POST _transform/slo-summary-occurrences-weekly-aligned/_stop?force=true -POST _transform/slo-summary-timeslices-30d-rolling/_stop?force=true -POST _transform/slo-summary-timeslices-7d-rolling/_stop?force=true -POST _transform/slo-summary-timeslices-90d-rolling/_stop?force=true -POST _transform/slo-summary-timeslices-monthly-aligned/_stop?force=true -POST _transform/slo-summary-timeslices-weekly-aligned/_stop?force=true - -# Delete all legacy summary transforms -DELETE _transform/slo-summary-occurrences-30d-rolling?force=true -DELETE _transform/slo-summary-occurrences-7d-rolling?force=true -DELETE _transform/slo-summary-occurrences-90d-rolling?force=true -DELETE _transform/slo-summary-occurrences-monthly-aligned?force=true -DELETE _transform/slo-summary-occurrences-weekly-aligned?force=true -DELETE _transform/slo-summary-timeslices-30d-rolling?force=true -DELETE _transform/slo-summary-timeslices-7d-rolling?force=true -DELETE _transform/slo-summary-timeslices-90d-rolling?force=true -DELETE _transform/slo-summary-timeslices-monthly-aligned?force=true -DELETE _transform/slo-summary-timeslices-weekly-aligned?force=true -``` - -Do not delete any new summary transforms used by your migrated SLOs. - -:::: diff --git a/raw-migrated-files/observability-docs/observability/synthetics-troubleshooting.md b/raw-migrated-files/observability-docs/observability/synthetics-troubleshooting.md deleted file mode 100644 index 7835d07840..0000000000 --- a/raw-migrated-files/observability-docs/observability/synthetics-troubleshooting.md +++ /dev/null @@ -1,121 +0,0 @@ ---- -navigation_title: "Troubleshooting" ---- - -# Troubleshooting Synthetics [synthetics-troubleshooting] - - - -## Local debugging [synthetics-troubleshooting-local-debugging] - -For debugging synthetic tests locally, you can set an environment variable, `DEBUG=synthetics`, to capture Synthetics agent logs when using the [Synthetics CLI](../../../solutions/observability/apps/use-synthetics-cli.md). - - -## Common issues [synthetics-troubleshooting-common-issues] - - -### Monitors stopped running after upgrading to 8.8.0 or above [synthetics-troubleshooting-missing-api-key] - -Synthetic monitors will stop running if you have gone through this workflow: - -1. Enabled Monitor Management (in the {{uptime-app}}) prior to 8.6.0. -2. Created a synthetic monitor that is configured to run on Elastic’s global managed infrastructure. -3. Upgraded to 8.8.0 or above. - -This happens because the permissions granted by clicking **Enable Monitor Management** in versions prior to 8.6.0 are not sufficient in versions 8.8.0 and above. - -To fix this, a user with [admin permissions](../../../solutions/observability/apps/setup-role.md) needs to visit the {{synthetics-app}} in {{kib}}. In 8.8.0 and above, the equivalent of "enabling monitor management" happens automatically in the background when a user with [admin permissions](../../../solutions/observability/apps/setup-role.md) visits the {{synthetics-app}}. - -If a user *without* [admin permissions](../../../solutions/observability/apps/setup-role.md) visits the {{synthetics-app}} before an admin has visited it, the user will see a note that says "Only administrators can enable this feature". That note will persist until an admin user visits the {{synthetics-app}}. - - -### No results from a monitor configured to run on a {{private-location}} [synthetics-troubleshooting-no-agent-running] - -If you have created a {{private-location}} and configured a monitor to run on that {{private-location}}, but don’t see any results for that monitor in the {{synthetics-app}}, make sure there is an agent configured to run against the agent policy. - -::::{note} -If you attempt to assign an agent policy to a {{private-location}} *before* configuring an agent to run against the agent policy, you will see a note in the {{synthetics-app}} UI that the selected agent policy has no agents. - -:::: - - -When creating a {{private-location}}, you have to: - -1. [Set up {{fleet-server}} and {{agent}}](../../../solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-fleet-agent). -2. [Connect {{fleet}} to the {{stack}}](../../../solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-connect) and enroll an {{agent}} in {{fleet}}. -3. [Add a {{private-location}}](../../../solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-add) in the {{synthetics-app}}. - -If you do not complete the second item, no agents will be configured to run against the agent policy, and any monitors configured to run on that {{private-location}} won’t be able to run so there will be no results in the {{synthetics-app}}. - -To fix this, make sure there is an agent configured to run against the agent policy. - - -### No results from a monitor [synthetics-troubleshooting-no-direct-es-connection] - -If you have configured a monitor but don’t see any results for that monitor in the {{synthetics-app}}, whether running them from Elastic’s global managed testing infrastructure or from {{private-location}}s, ensure Synthetics has a direct connection to {{es}}. - -Do not configure any ingest pipelines or output via Logstash as this will prevent Synthetics from working properly and is not [supported](../../../solutions/observability/apps/synthetics-support-matrix.md). - - -### Browser monitor configured to run on a {{private-location}} not running to schedule [synthetics-troubleshooting-missing-browser-schedules] - -If you have browser monitors configured to run on a {{private-location}} but notice one or more of them are not running as scheduled, this could be because: - -* The time it takes for your monitor to run is longer than the frequency you have set -* There may be too many monitors trying to run concurrently, causing some of them to skip their scheduled run - -You may also see a message in the logs such as `2 tasks have missed their schedule deadlines by more than 1 second in the last 15s`. These will be visible from inside the Agent diagnostic ZIP file, and the numbers and time periods may be different in your logs. - -Start by identifying the cause of the issue. First, check if the time it takes the monitor to run is less than the scheduled frequency: - -1. Go to the {{synthetics-app}}. -2. Click the monitor, then click **Go to monitor**. -3. Go to the [Overview tab](../../../solutions/observability/apps/analyze-data-from-synthetic-monitors.md#synthetics-analyze-individual-monitors-overview) to see the *Avg. duration*. You can also view the duration for individual runs in the [History tab](../../../solutions/observability/apps/analyze-data-from-synthetic-monitors.md#synthetics-analyze-individual-monitors-history). -4. Compare the duration to the scheduled frequency. If the duration is *greater than* the scheduled frequency, for example if the monitor that takes 90 seconds to run and its scheduled frequency is 1 minute, the next scheduled run will not occur because the current one is still running so you may see results for every other scheduled run. - - To fix this, you can either: - - * Change the frequency so the monitor runs less often. - * Refactor the monitor so it can run in a shorter amount of time. - - -If the duration is *less than* the scheduled frequency or the suggestion above does not fix the issue, then there may be too many browser monitors attempting to run on the {{private-location}}. Due to the additional hardware overhead of running browser monitors, we limit each {{private-location}} to only run two browser monitors at the same time. Depending on how many browser monitors you have configured to run on the {{private-location}} and their schedule, the {{private-location}} may not be able to run them all because it would require more than two browser tests to be running simultaneously. - -To fix this issue, you can either: - -* Increase the number of concurrent browser monitors allowed (as described in [Scaling Private Locations](../../../solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-scaling)), paying attention to the scaling and hardware requirements documented. -* Create multiple {{private-location}}s and spread your browser monitors across them more evenly (effectively horizontally scaling your {{private-location}}s). - - -### No locations are available [synthetics-troubleshooting-no-locations] - -When using {{ecloud}}, if there are no options available in the *Locations* dropdown when you try to create a monitor in the {{synthetics-app}} *or* if no locations are listed when using the [`location` command](../../../solutions/observability/apps/use-synthetics-cli.md#elastic-synthetics-locations-command), it might be because you do not have permission to use Elastic managed locations *and* there are no [Private Locations](../../../solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent) available yet. - -There are a few ways to fix this: - -* If you have [write access](../../../solutions/observability/apps/writer-role.md) including the privileges for [creating new Private Locations](../../../solutions/observability/apps/writer-role.md#synthetics-role-write-private-locations), you can [create a new Private Location](../../../solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent). Then try creating the monitor again. -* If you do *not* have the right privileges to create a Private Location, you can ask someone with the [necessary privileges](../../../solutions/observability/apps/writer-role.md#synthetics-role-write-private-locations) to create a Private Location or ask an administrator with a [setup role](../../../solutions/observability/apps/setup-role.md) to give you the necessary privileges and [create a new Private Location](../../../solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent). Then try creating the monitor again. -* If you want to create a monitor to run on Elastic’s global managed infrastructure, ask an administrator with a [setup role](../../../solutions/observability/apps/setup-role.md) to update [`Synthetics and Uptime` sub-feature privileges](../../../solutions/observability/apps/writer-role.md#disable-managed-locations) for the role you’re currently assigned. Then try creating the monitor again. - - -### You do not have permission to use Elastic managed locations [synthetics-troubleshooting-public-locations-disabled] - -If you try to create or edit a monitor hosted on Elastic’s global managed infrastructure but see a note that you do not have permission to use Elastic managed locations, an administrator has restricted the use of public locations. - -To fix this you can either: - -* Ask an administrator with a [setup role](../../../solutions/observability/apps/setup-role.md) to update [`Synthetics and Uptime` sub-feature privileges](../../../solutions/observability/apps/writer-role.md#disable-managed-locations) for the role you’re currently assigned or assign you a role that allows using Elastic’s global managed infrastructure. -* Use a [Private Location](../../../solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent). - - -## Get help [synthetics-troubleshooting-get-help] - - -### Elastic Support [synthetics-troubleshooting-support] - -We offer a support experience unlike any other. Our team of professionals *speak human and code* and love making your day. [Learn more about subscriptions](https://www.elastic.co/subscriptions). - - -### Discussion forum [synthetics-troubleshooting-discussion] - -For other questions and feature requests, visit our [discussion forum](https://discuss.elastic.co//c/observability/synthetics/75). diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml index b549997c2a..1a3a956fc1 100644 --- a/raw-migrated-files/toc.yml +++ b/raw-migrated-files/toc.yml @@ -354,10 +354,8 @@ toc: - file: docs-content/serverless/observability-synthetics-scale-and-architect.md - file: docs-content/serverless/observability-synthetics-security-encryption.md - file: docs-content/serverless/observability-synthetics-settings.md - - file: docs-content/serverless/observability-synthetics-troubleshooting.md - file: docs-content/serverless/observability-triage-slo-burn-rate-breaches.md - file: docs-content/serverless/observability-triage-threshold-breaches.md - - file: docs-content/serverless/observability-troubleshoot-logs.md - file: docs-content/serverless/observability-view-alerts.md - file: docs-content/serverless/observability-view-infrastructure-metrics.md - file: docs-content/serverless/project-and-management-settings.md @@ -474,7 +472,6 @@ toc: - file: docs-content/serverless/security-vuln-management-faq.md - file: docs-content/serverless/security-vuln-management-get-started.md - file: docs-content/serverless/security-vuln-management-overview.md - - file: docs-content/serverless/slo-troubleshoot-slos.md - file: docs-content/serverless/spaces.md - file: docs-content/serverless/what-is-observability-serverless.md - file: elasticsearch-hadoop/elasticsearch-hadoop/index.md @@ -673,7 +670,6 @@ toc: - file: observability-docs/observability/logs-plaintext.md - file: observability-docs/observability/logs-send-application.md - file: observability-docs/observability/logs-stream.md - - file: observability-docs/observability/logs-troubleshooting.md - file: observability-docs/observability/manage-cases-settings.md - file: observability-docs/observability/manage-cases.md - file: observability-docs/observability/monitor-datasets.md @@ -690,7 +686,6 @@ toc: - file: observability-docs/observability/rate-aggregation.md - file: observability-docs/observability/slo-burn-rate-alert.md - file: observability-docs/observability/slo-create.md - - file: observability-docs/observability/slo-troubleshoot-slos.md - file: observability-docs/observability/slo.md - file: observability-docs/observability/synthetics-analyze.md - file: observability-docs/observability/synthetics-command-reference.md @@ -712,7 +707,6 @@ toc: - file: observability-docs/observability/synthetics-scale-and-architect.md - file: observability-docs/observability/synthetics-security-encryption.md - file: observability-docs/observability/synthetics-settings.md - - file: observability-docs/observability/synthetics-troubleshooting.md - file: observability-docs/observability/triage-slo-burn-rate-breaches.md - file: observability-docs/observability/triage-threshold-breaches.md - file: observability-docs/observability/view-infrastructure-metrics.md diff --git a/solutions/observability/apps/common-problems.md b/solutions/observability/apps/common-problems.md index a7ab638673..044cc150b9 100644 --- a/solutions/observability/apps/common-problems.md +++ b/solutions/observability/apps/common-problems.md @@ -4,6 +4,9 @@ mapped_urls: - https://www.elastic.co/guide/en/serverless/current/observability-apm-troubleshooting.html --- +% This page exists in the Troubleshoot section (troubleshoot/observability/apm/common-problems.md), so probably OK to delete from Solutions? + + # Common problems % What needs to be done: Align serverless/stateful diff --git a/troubleshoot/observability/apm/common-problems.md b/troubleshoot/observability/apm/common-problems.md index c8f663c237..120632a44f 100644 --- a/troubleshoot/observability/apm/common-problems.md +++ b/troubleshoot/observability/apm/common-problems.md @@ -4,43 +4,310 @@ mapped_pages: - https://www.elastic.co/guide/en/serverless/current/observability-apm-troubleshooting.html --- -# Common problems +# Common problems [apm-common-problems] -% What needs to be done: Align serverless/stateful +This section describes common problems you might encounter when using APM Server and the Applications UI in {{kib}}. -% Use migrated content from existing pages that map to this page: +**APM Server**: -% - [ ] ./raw-migrated-files/observability-docs/observability/apm-common-problems.md -% - [ ] ./raw-migrated-files/docs-content/serverless/observability-apm-troubleshooting.md +* [No data is indexed](#apm-no-data-indexed) +* [Common SSL-related problems](#apm-common-ssl-problems) +* [I/O Timeout](#apm-io-timeout) +* [Field limit exceeded](#apm-field-limit-exceeded) +* [Tail-based sampling causing high system memory usage and high disk IO](#apm-tail-based-sampling-memory-disk-io) -% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): +**Applications UI**: -$$$apm-no-data-indexed$$$ +* [Too many unique transaction names](#troubleshooting-too-many-transactions) +* [Unknown route](#troubleshooting-unknown-route) +* [Fields are not searchable](#troubleshooting-fields-unsearchable) +* [Service Maps: no connection between client and server](#service-map-rum-connections) +* [No data shown in the infrastructure tab](#troubleshooting-apm-infra-data) -$$$apm-common-ssl-problems$$$ -$$$apm-io-timeout$$$ +## No data is indexed [apm-no-data-indexed] +:::{applies} +:stack: all +::: -$$$apm-field-limit-exceeded$$$ +If no data shows up in {{es}}, first make sure that your APM components are properly connected. -$$$apm-tail-based-sampling-memory-disk-io$$$ +:::::::{tab-set} -$$$troubleshooting-too-many-transactions$$$ +::::::{tab-item} Fleet-managed +**Is {{agent}} healthy?** -$$$troubleshooting-unknown-route$$$ +In {{kib}} open **{{fleet}}** and find the host that is running the APM integration; confirm that its status is **Healthy**. If it isn’t, check the {{agent}} logs to diagnose potential causes. See [Monitor {{agent}}s](https://www.elastic.co/guide/en/fleet/current/monitor-elastic-agent.html) to learn more. -$$$troubleshooting-fields-unsearchable$$$ +**Is APM Server happy?** -$$$service-map-rum-connections$$$ +In {{kib}}, open **{{fleet}}** and select the host that is running the APM integration. Open the **Logs** tab and select the `elastic_agent.apm_server` dataset. Look for any APM Server errors that could help diagnose the problem. -$$$troubleshooting-apm-infra-data$$$ +**Can the {{apm-agent}} connect to APM Server** -$$$apm-ssl-client-fails$$$ +To determine if the {{apm-agent}} can connect to the APM Server, send requests to the instrumented service and look for lines containing `[request]` in the APM Server logs. -$$$apm-cannot-validate-certificate$$$ +If no requests are logged, confirm that: -$$$apm-getsockopt-no-route-to-host$$$ +1. SSL isn’t [misconfigured](#apm-ssl-client-fails). +2. The host is correct. For example, if you’re using Docker, ensure a bind to the right interface (for example, set `apm-server.host = 0.0.0.0:8200` to match any IP) and set the `SERVER_URL` setting in the {{apm-agent}} accordingly. -$$$apm-getsockopt-connection-refused$$$ +If you see requests coming through the APM Server but they are not accepted (a response code other than `202`), see [APM Server response codes](apm-server-response-codes.md) to narrow down the possible causes. + +**Instrumentation gaps** + +APM agents provide auto-instrumentation for many popular frameworks and libraries. If the {{apm-agent}} is not auto-instrumenting something that you were expecting, data won’t be sent to the {{stack}}. Reference the relevant [{{apm-agent}} documentation](https://www.elastic.co/guide/en/apm/agent/index.html) for details on what is automatically instrumented. +:::::: + +::::::{tab-item} APM Server binary +If no data shows up in {{es}}, first check that the APM components are properly connected. + +To ensure that APM Server configuration is valid and it can connect to the configured output, {{es}} by default, run the following commands: + +```sh +apm-server test config +apm-server test output +``` + +To see if the agent can connect to the APM Server, send requests to the instrumented service and look for lines containing `[request]` in the APM Server logs. + +If no requests are logged, it might be that SSL is [misconfigured](#apm-ssl-client-fails) or that the host is wrong. Particularly, if you are using Docker, ensure to bind to the right interface (for example, set `apm-server.host = 0.0.0.0:8200` to match any IP) and set the `SERVER_URL` setting in the agent accordingly. + +If you see requests coming through the APM Server but they are not accepted (response code other than `202`), consider the response code to narrow down the possible causes (see sections below). + +Another reason for data not showing up is that the agent is not auto-instrumenting something you were expecting, check the [agent documentation](https://www.elastic.co/guide/en/apm/agent/index.html) for details on what is automatically instrumented. + +APM Server currently relies on {{es}} to create indices that do not exist. As a result, {{es}} must be configured to allow [automatic index creation](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-create) for APM indices. +:::::: + +::::::: + +## Common SSL-related problems [apm-common-ssl-problems] +:::{applies} +:stack: all +::: + +* [SSL client fails to connect](#apm-ssl-client-fails) +* [x509: cannot validate certificate](#apm-cannot-validate-certificate) +* [getsockopt: no route to host](#apm-getsockopt-no-route-to-host) +* [getsockopt: connection refused](#apm-getsockopt-connection-refused) +* [No connection could be made because the target machine actively refused it](#apm-target-machine-refused-connection) + + +### SSL client fails to connect [apm-ssl-client-fails] + +The target host might be unreachable or the certificate may not be valid. To fix this problem: + +1. Make sure that the APM Server process on the target host is running and you can connect to it. Try to ping the target host to verify that you can reach it from the host running APM Server. Then use either `nc` or `telnet` to make sure that the port is available. For example: + + ```shell + ping + telnet 5044 + ``` + +2. Verify that the certificate is valid and that the hostname and IP match. +3. Use OpenSSL to test connectivity to the target server and diagnose problems. See the [OpenSSL documentation](https://www.openssl.org/docs/manmaster/man1/openssl-s_client.md) for more info. + + +### x509: cannot validate certificate for because it doesn’t contain any IP SANs [apm-cannot-validate-certificate] + +This happens because your certificate is only valid for the hostname present in the Subject field. To resolve this problem, try one of these solutions: + +* Create a DNS entry for the hostname, mapping it to the server’s IP. +* Create an entry in `/etc/hosts` for the hostname. Or, on Windows, add an entry to `C:\Windows\System32\drivers\etc\hosts`. +* Re-create the server certificate and add a Subject Alternative Name (SAN) for the IP address of the server. This makes the server’s certificate valid for both the hostname and the IP address. + + +### getsockopt: no route to host [apm-getsockopt-no-route-to-host] + +This is not an SSL problem. It’s a networking problem. Make sure the two hosts can communicate. + + +### getsockopt: connection refused [apm-getsockopt-connection-refused] + +This is not an SSL problem. Make sure that {{ls}} is running and that there is no firewall blocking the traffic. + + +### No connection could be made because the target machine actively refused it [apm-target-machine-refused-connection] + +A firewall is refusing the connection. Check if a firewall is blocking the traffic on the client, the network, or the destination host. + + +## I/O Timeout [apm-io-timeout] +:::{applies} +:stack: all +::: + +I/O Timeouts can occur when your timeout settings across the stack are not configured correctly, especially when using a load balancer. + +You may see an error like the one below in the {{apm-agent}} logs, and/or a similar error on the APM Server side: + +```txt +[ElasticAPM] APM Server responded with an error: +"read tcp 123.34.22.313:8200->123.34.22.40:41602: i/o timeout" +``` + +To fix this, ensure timeouts are incrementing from the {{apm-agent}}, through your load balancer, to the APM Server. + +By default, the agent timeouts are set at 10 seconds, and the server timeout is set at 3600 seconds. Your load balancer should be set somewhere between these numbers. + +For example: + +```txt +APM agent --> Load Balancer --> APM Server + 10s 15s 3600s +``` + +The APM Server timeout can be configured by updating the [maximum duration for reading an entire request](../../../solutions/observability/apps/general-configuration-options.md#apm-read_timeout). + + +## Field limit exceeded [apm-field-limit-exceeded] + +When adding too many distinct tag keys on a transaction or span, you risk creating a [mapping explosion](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings). + +For example, you should avoid that user-specified data, like URL parameters, is used as a tag key. Likewise, using the current timestamp or a user ID as a tag key is not a good idea. However, tag **values** with a high cardinality are not a problem. Just try to keep the number of distinct tag keys at a minimum. + +The symptom of a mapping explosion is that transactions and spans are not indexed anymore after a certain time. Usually, on the next day, the spans and transactions will be indexed again because a new index is created each day. But as soon as the field limit is reached, indexing stops again. + +In the agent logs, you won’t see a sign of failures as the APM server asynchronously sends the data it received from the agents to {{es}}. However, the APM server and {{es}} log a warning like this: + +```txt +{\"type\":\"illegal_argument_exception\",\"reason\":\"Limit of total fields [1000] in [INDEX_NAME] has been exceeded\"} +``` + + +## Tail-based sampling causing high system memory usage and high disk IO [apm-tail-based-sampling-memory-disk-io] +:::{applies} +:stack: all +::: + +Tail-based sampling requires minimal memory to run, and there should not be a noticeable increase in RSS memory usage. However, since tail-based sampling writes data to disk, it is possible to see a significant increase in OS page cache memory usage due to disk IO. If you see a drop in throughput and excessive disk activity after enabling tail-based sampling, please ensure that there is enough memory headroom in the system for OS page cache to perform disk IO efficiently. + + +## Too many unique transaction names [troubleshooting-too-many-transactions] +:::{applies} +:stack: all +::: + +Transaction names are defined in each APM agent; when an APM agent supports a framework, it includes logic for naming the transactions that the framework creates. In some cases though, like when using an APM agent’s API to create custom transactions, it is up to the user to define a pattern for transaction naming. When transactions are named incorrectly, each unique URL can be associated with a unique transaction group—causing an explosion in the number of transaction groups per service, and leading to inaccuracies in the Applications UI. + +To fix a large number of unique transaction names, you need to change how you are using the APM agent API to name your transactions. To do this, ensure you are **not** naming based on parameters that can change. For example, user ids, product ids, order numbers, query parameters, etc., should be stripped away, and commonality should be found between your unique URLs. + +Let’s look at an example from the RUM agent documentation. Here are a few URLs you might find on Elastic.co: + +```yaml +// Blog Posts +https://www.elastic.co/blog/reflections-on-three-years-in-the-elastic-public-sector +https://www.elastic.co/blog/say-heya-to-the-elastic-search-awards +https://www.elastic.co/blog/and-the-winner-of-the-elasticon-2018-training-subscription-drawing-is + +// Documentation +https://www.elastic.co/guide/en/elastic-stack/current/index.html +https://www.elastic.co/guide/en/apm/get-started/current/index.html +https://www.elastic.co/guide/en/infrastructure/guide/current/index.html +``` + +These URLs, like most, include unique names. If we named transactions based on each unique URL, we’d end up with the problem described above—a very large number of different transaction names. Instead, we should strip away the unique information and group our transactions based on common information. In this case, that means naming all blog transactions, `/blog`, and all documentation transactions, `/guide`. + +If you feel like you’d be losing valuable information by following this naming convention, don’t fret! You can always add additional metadata to your transactions using [labels](/solutions/observability/apps/metadata.md#apm-data-model-labels) (indexed) or [custom context](/solutions/observability/apps/metadata.md#apm-data-model-custom) (non-indexed). + +After ensuring you’ve correctly named your transactions, you might still see errors in the Applications UI related to transaction group limit reached: + +`The number of transaction groups has been reached. Current APM server capacity for handling unique transaction groups has been reached. There are at least X transactions missing in this list. Please decrease the number of transaction groups in your service or increase the memory allocated to APM server.` + +You will see this warning if an agent is creating too many transaction groups. This could indicate incorrect instrumentation which will have to be fixed in your application. Alternatively you can increase the memory of the APM server. + +`Number of transaction groups exceed the allowed maximum(1,000) that are displayed. The maximum number of transaction groups displayed in Kibana has been reached. Try narrowing down results by using the query bar..` + +You will see this warning if your results have more than `1000` unique transaction groups. Alternatively you can use the query bar to reduce the number of unique transaction groups in your results. + +**More information** + +While this can happen with any APM agent, it typically occurs with the RUM agent. For more information on how to correctly set `transaction.name` in the RUM agent, see [custom initial page load transaction names](https://www.elastic.co/guide/en/apm/agent/rum-js/current/custom-transaction-name.html). + +The RUM agent can also set the `transaction.name` when observing for transaction events. See [`apm.observe()`](https://www.elastic.co/guide/en/apm/agent/rum-js/current/agent-api.html#observe) for more information. + +If your problem is occurring in a different APM agent, the tips above still apply. See the relevant [Agent API documentation](https://www.elastic.co/guide/en/apm/agent) to adjust how you’re naming your transactions. + + +## Unknown route [troubleshooting-unknown-route] +:::{applies} +:stack: all +::: + +The [transaction overview](../../../solutions/observability/apps/transactions-2.md) will only display helpful information when the transactions in your services are named correctly. If you’re seeing "GET unknown route" or "unknown route" in the Applications UI, it could be a sign that something isn’t working as it should. + +Elastic APM agents come with built-in support for popular frameworks out-of-the-box. This means, among other things, that the APM agent will try to automatically name HTTP requests. As an example, the Node.js agent uses the route that handled the request, while the Java agent uses the Servlet name. + +"Unknown route" indicates that the APM agent can’t determine what to name the request, perhaps because the technology you’re using isn’t supported, the agent has been installed incorrectly, or because something is happening to the request that the agent doesn’t understand. + +To resolve this, you’ll need to head over to the relevant [APM agent documentation](https://www.elastic.co/guide/en/apm/agent). Specifically, view the agent’s supported technologies page. You can also use the agent’s public API to manually set a name for the transaction. + + +## Fields are not searchable [troubleshooting-fields-unsearchable] +:::{applies} +:stack: all +::: + +In Elasticsearch, index templates are used to define settings and mappings that determine how fields should be analyzed. The recommended index templates for APM come from the built-in {{es}} apm-data plugin. These templates, by default, enable and disable indexing on certain fields. + +As an example, some APM agents store cookie values in `http.request.cookies`. Since `http.request` has disabled dynamic indexing, and `http.request.cookies` is not declared in a custom mapping, the values in `http.request.cookies` are not indexed and thus not searchable. + +**Ensure an APM data view exists** As a first step, you should ensure the correct data view exists. In {{kib}}, go to **Stack Management** > **Data views**. You should see the APM data view—​the default is `traces-apm*,apm-*,logs-apm*,apm-*,metrics-apm*,apm-*`. If you don’t, the data view doesn’t exist. To fix this, navigate to the Applications UI in {{kib}} and select **Add data**. In the APM tutorial, click **Load Kibana objects** to create the APM data view. + +**Ensure a field is searchable** There are two things you can do to if you’d like to ensure a field is searchable: + +1. Index your additional data as [labels](/solutions/observability/apps/metadata.md) instead. These are dynamic by default, which means they will be indexed and become searchable and aggregatable. +2. Create a custom mapping for the field. + + +## Service Maps: no connection between client and server [service-map-rum-connections] +:::{applies} +:stack: all +::: + +If the service map is not showing an expected connection between the client and server, it’s likely because you haven’t configured [`distributedTracingOrigins`](https://www.elastic.co/guide/en/apm/agent/rum-js/current/distributed-tracing-guide.html). + +This setting is necessary, for example, for cross-origin requests. If you have a basic web application that provides data via an API on `localhost:4000`, and serves HTML from `localhost:4001`, you’d need to set `distributedTracingOrigins: ['https://localhost:4000']` to ensure the origin is monitored as a part of distributed tracing. In other words, `distributedTracingOrigins` is consulted prior to the APM agent adding the distributed tracing `traceparent` header to each request. + + +## No data shown in the infrastructure tab [troubleshooting-apm-infra-data] +:::{applies} +:stack: all +::: + +If you don’t see any data in the **Infrastructure** tab for a selected service in the Applications UI, there are a few possible causes and solutions. + +**If you also do *not* see the data in the** [**Infrastructure inventory**](../../../solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md) + +Refer to the [Infrastructure troubleshooting docs](../troubleshooting-infrastructure-monitoring.md). + +**If you *do* see the data in the** [**Infrastructure inventory**](../../../solutions/observability/infra-and-hosts/view-infrastructure-metrics-by-resource-type.md) + +It’s likely that there is a problem correlating APM and infrastructure data. The `host.hostname` field value in the APM data and the `host.name` field value in infrastructure data are used to correlate data, and the queries used to correlate the data are case sensitive. + +To fix this, make sure these two fields match exactly. + +For example, if the APM agent is not configured to use the correct host name, the host name might be set to the container name or the Kubernetes pod name. To get the correct host name, you need to set some additional configuration options, specifically `system.kubernetes.node.name` as described in [Kubernetes data](../../../solutions/observability/apps/elastic-apm-events-intake-api.md#apm-api-kubernetes-data). + + +## Common response codes [observability-apm-troubleshooting-common-response-codes] +:::{applies} +:serverless: all +::: + + +### HTTP 400: Data decoding error / Data validation error [bad-request] + +The most likely cause for this error is using an incompatible version of an {{apm-agent}}. See [minimum supported APM agent versions](../../../solutions/observability/apps/elastic-apm-agents.md#observability-apm-agents-elastic-apm-agents-minimum-supported-versions) to verify compatibility. + + +### HTTP 400: Event too large [event-too-large] + +APM agents communicate with the Managed intake service by sending events in an HTTP request. Each event is sent as its own line in the HTTP request body. If events are too large, you can reduce the size of the events that your APM agents send by: [enabling span compression](../../../solutions/observability/apps/spans.md) or [reducing collected stack trace information](../../../solutions/observability/apps/reduce-storage.md#observability-apm-reduce-stacktrace). + + +### HTTP 401: Invalid token [unauthorized] + +The API key is invalid. -$$$apm-target-machine-refused-connection$$$ \ No newline at end of file diff --git a/troubleshoot/observability/troubleshoot-logs.md b/troubleshoot/observability/troubleshoot-logs.md index d754c242e8..24b1767afc 100644 --- a/troubleshoot/observability/troubleshoot-logs.md +++ b/troubleshoot/observability/troubleshoot-logs.md @@ -5,17 +5,244 @@ mapped_pages: - https://www.elastic.co/guide/en/serverless/current/observability-troubleshoot-logs.html --- -# Troubleshoot logs +# Troubleshoot logs [logs-troubleshooting] -% What needs to be done: Align serverless/stateful +Use this page to find possible solutions for errors your encountering with your logs. This troubleshooting page is divided into the following sections: -% Use migrated content from existing pages that map to this page: +* [Common onboarding issues](#logs-onboarding-troubleshooting) +* [Mapping and pipeline issues](#logs-common-mapping-troubleshooting) -% - [ ] ./raw-migrated-files/observability-docs/observability/logs-troubleshooting.md -% - [ ] ./raw-migrated-files/docs-content/serverless/observability-troubleshoot-logs.md -% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): +## Common onboarding issues [logs-onboarding-troubleshooting] -$$$logs-common-mapping-troubleshooting$$$ +This section provides possible solutions for errors you might encounter while onboarding your logs. -$$$logs-onboarding-troubleshooting$$$ \ No newline at end of file + +### User does not have permissions to create API key [logs-troubleshooting-insufficient-priv] + +:::::{tab-set} + +::::{tab-item} {{serverless-short}} +When adding a new data using the guided instructions in your project (**Add data** → **Collect and analyze logs** → **Stream log files**), if you don’t have the required privileges to create an API key, you’ll see the following error message: + +:::{note} +You need permission to manage API keys +::: + + +#### Solution [observability-troubleshoot-logs-solution] + +You need to either: + +* Ask an administrator to update your user role to at least **Developer** by going to the user icon on the header bar and opening **Organization** → **Members**. Read more about user roles in [Assign user roles and privileges](../../raw-migrated-files/docs-content/serverless/general-manage-organization.md#general-assign-user-roles). After your use role is updated, restart the onboarding flow. +* Get an API key from an administrator and manually add the API to the {{agent}} configuration. See [Configure the {{agent}}](../../raw-migrated-files/docs-content/serverless/observability-stream-log-files.md#observability-stream-log-files-step-3-configure-the-agent) for more on manually updating the configuration and adding the API key. +:::: + +::::{tab-item} {{stack}} 9.0.0+ +If you don’t have the required privileges to create an API key, you’ll see the following error message: + +```plaintext +User does not have permissions to create API key. + +Required cluster privileges are [`monitor`, `manage_own_api_key`] and +required index privileges are [`auto_configure`, `create_doc`] for +indices [`logs-*-*`, `metrics-*-*`], please add all required privileges +to the role of the authenticated user. +``` + + +#### Solution [logs-troubleshooting-insufficient-priv-solution] + +You need to either: + +* Have an administrator give you the `monitor` and `manage_own_api_key` cluster privileges and the `auto_configure` and `create_doc` indices privileges. Once you have these privileges, restart the onboarding flow. +* Get an API key from an administrator and manually add the API to the {{agent}} configuration. See [Configure the {{agent}}](../../solutions/observability/logs/stream-any-log-file.md#logs-stream-agent-config) for more on manually updating the configuration and adding the API key. +:::: + +::::: + + +### Failed to create API key [logs-troubleshooting-API-key-failed] +:::{applies} +:stack: all +::: + +If you don’t have the privileges to create `savedObjects` in {{kib}}, you’ll see the following error message: + +```plaintext +Failed to create API key + +Something went wrong: Unable to create observability-onboarding-state +``` + + +#### Solution [logs-troubleshooting-API-key-failed-solution] + +You need an administrator to give you the `Saved Objects Management` {{kib}} privilege to generate the required `observability-onboarding-state` flow state. Once you have the necessary privileges, restart the onboarding flow. + + +### {{kib}} or Observability project not accessible from host [logs-troubleshooting-kib-not-accessible] + +If {{kib}} or your Observability project is not accessible from the host, you’ll see the following error message after pasting the **Install the {{agent}}** instructions into the host: + +```plaintext +Failed to connect to {host} port {port} after 0 ms: Connection refused +``` + + +#### Solution [logs-troubleshooting-kib-not-accessible-solution] + +The host needs access to {{kib}} or your project. Port `443` must be open and the deployment’s {{es}} endpoint must be reachable. Locate your project’s endpoint from **Help menu (![help icon](../../images/observability-help-icon.png "")) → Connection details**. + +Run the following command, replacing the URL with your endpoint, and you should get an authentication error with more details on resolving your issue: + +```shell +curl https://your-endpoint.elastic.cloud +``` + +### Download {{agent}} failed [logs-troubleshooting-download-agent] + +If the host was able to download the installation script but cannot connect to the public artifact repository, you’ll see the following error message: + +```plaintext +Download Elastic Agent + +Failed to download Elastic Agent, see script for error. +``` + + +#### Solutions [logs-troubleshooting-download-agent-solution] + +* If the combination of the {{agent}} version and operating system architecture is not available, you’ll see the following error message: + + ```plaintext + The requested URL returned error: 404 + ``` + + To fix this, update the {{agent}} version in the installation instructions to a known version of the {{agent}}. + +* If the {{agent}} was fully downloaded previously, you’ll see the following error message: + + ```plaintext + Error: cannot perform installation as Elastic Agent is already running from this directory + ``` + + To fix this, delete previous downloads and restart the onboarding. + +* You’re an Elastic Cloud Enterprise user without access to the Elastic downloads page. + + +### Install {{agent}} failed [logs-troubleshooting-install-agent] + +If an {{agent}} already exists on your host, you’ll see the following error message: + +```plaintext +Install Elastic Agent + +Failed to install Elastic Agent, see script for error. +``` + + +#### Solution [logs-troubleshooting-install-agent-solution] + +You can uninstall the current {{agent}} using the `elastic-agent uninstall` command, and run the script again. + +::::{warning} +Uninstalling the current {{agent}} removes the entire current setup, including the existing configuration. +:::: + + + +### Waiting for Logs to be shipped…​ step never completes [logs-troubleshooting-wait-for-logs] + +If the **Waiting for Logs to be shipped…​** step never completes, logs are not being shipped to {{es}} or your Observability project, and there is most likely an issue with your {{agent}} configuration. + + +#### Solution [logs-troubleshooting-wait-for-logs-solution] + +Inspect the {{agent}} logs for errors. See the [Debug standalone {{agent}}s](https://www.elastic.co/guide/en/fleet/current/debug-standalone-agents.html#inspect-standalone-agent-logs) documentation for more on finding errors in {{agent}} logs. + + +## Mapping and pipeline issues [logs-common-mapping-troubleshooting] + +This section provides possible solutions for mapping and pipeline issues you might encounter with your logs. + + +### Keyword fields are too long [logs-mapping-troubleshooting-keyword-limit] + +The `keyword` field limit is 32,766 bytes. When indexing a document, if your `keyword` field length exceeds this limit, you’ll see an error similar to the following: + +```plaintext +max_bytes_length_exceeded_exception: bytes can be at most 32766 in length +``` + + +#### Solution [logs-mapping-troubleshooting-keyword-limit-solution] + +Avoid this error using one of the following options: + +**Stop indexing the field:** If you don’t need the `keyword` field for aggregation or search, set `"index":false` in the index template to stop indexing the field. + +**Convert the `keyword` field to a `text` field:** To continue indexing the field while avoiding length limits, you can convert the `keyword` field to a `text` field. + +::::{note} +Aggregations on this field would no longer be supported, but the contents would be searchable. +:::: + + +To convert the `keyword` field to a `text` field: + +1. Create a new index with the `text` field data type. +2. Reindex from the `_source` field of the source index using the [`_reindex` API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-reindex). + + +### Date format mismatch [logs-mapping-troubleshooting-date-mismatch] + +If the format of the `date` field in your document doesn’t match the format set in your index template, you’ll see an error similar to the following: + +```plaintext +failed to parse field [date] of type [date] in document with id 'KGcZb3cBqhj6kAxank_x'. +``` + + +#### Solution [logs-mapping-troubleshooting-date-solution] + +Add the format of the mismatched date to your index template. Multiple formats can be specified by separating them with `||` as a separator. Each format will be tried in turn until a matching format is found. For example: + +$$$date-format-example$$$ + +```console +PUT my-index-000001 +{ + "mappings": { + "properties": { + "date": { + "type": "date", + "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" + } + } + } +} +``` + +Refer to the [`date` field type](https://www.elastic.co/guide/en/elasticsearch/reference/current/date.html) docs for more information. + + +### Grok or dissect pattern mismatch [logs-mapping-troubleshooting-grok-mismatch] + +If the pattern in your grok or dissect processor doesn’t match the format of your document, you’ll see an error similar to the following: + +```plaintext +Provided Grok patterns do not match field value... +``` + + +#### Solution [logs-mapping-troubleshooting-grok-solution] + +Make sure your [grok](https://www.elastic.co/guide/en/elasticsearch/reference/current/grok-processor.html) or [dissect](https://www.elastic.co/guide/en/elasticsearch/reference/current/dissect-processor.html) processor pattern matches your log document format. + +You can build and debug grok patterns in {{kib}} using the [Grok Debugger](../../explore-analyze/query-filter/tools/grok-debugger.md). Find the **Grok Debugger** by navigating to the **Developer tools** page using the navigation menu or the global search field. + +From here, you can enter sample data representative of the log document you’re trying to ingest and the Grok pattern you want to apply to the data. + +If you don’t see any **Structured Data** when you simulate the grok pattern, iterate on the pattern until you find the error. diff --git a/troubleshoot/observability/troubleshoot-service-level-objectives-slos.md b/troubleshoot/observability/troubleshoot-service-level-objectives-slos.md index 19793e8bb1..af2a79d347 100644 --- a/troubleshoot/observability/troubleshoot-service-level-objectives-slos.md +++ b/troubleshoot/observability/troubleshoot-service-level-objectives-slos.md @@ -5,25 +5,240 @@ mapped_pages: - https://www.elastic.co/guide/en/serverless/current/slo-troubleshoot-slos.html --- -# Troubleshoot service-level objectives (SLOs) +# Troubleshoot service-level objectives (SLOs) [slo-troubleshoot-slos] -% What needs to be done: Align serverless/stateful -% Use migrated content from existing pages that map to this page: +::::{important} +In {{stack}}, to create and manage SLOs, you need an [appropriate license](https://www.elastic.co/subscriptions), an {{es}} cluster with both `transform` and `ingest` [node roles](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#node-roles) present, and [SLO access](../../solutions/observability/incident-management/configure-service-level-objective-slo-access.md) must be configured. -% - [ ] ./raw-migrated-files/observability-docs/observability/slo-troubleshoot-slos.md -% - [ ] ./raw-migrated-files/docs-content/serverless/slo-troubleshoot-slos.md +:::: -% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): -$$$slo-common-problems$$$ +::::{warning} +Do not edit, delete, or tamper with any "internal" assets mentioned in this document, such as the transforms or ingest pipelines created by the SLO application. -$$$slo-troubleshoot-actions$$$ +Do not attempt to edit the `.slo-observability.*` indices mentioned in this document by overriding index templates or editing the settings/mappings. -$$$slo-troubleshoot-beta$$$ +The implementation details described here are subject to change. -$$$slo-troubleshoot-inspect$$$ +:::: -$$$slo-troubleshoot-reset$$$ -$$$slo-understanding-slos$$$ \ No newline at end of file +This document provides an overview of common issues encountered when working with service-level objectives (SLOs). It explores the relationships between SLOs and other core functionalities within the stack, such as [transforms](../../explore-analyze/transforms.md) and [ingest pipelines](../../manage-data/ingest/transform-enrich/ingest-pipelines.md), highlighting how these integrations can impact the functionality of SLOs. + +* [Understanding SLO internals](#slo-understanding-slos) +* [Common problems](#slo-common-problems) +* [SLO troubleshooting actions](#slo-troubleshoot-actions) +* [Upgrade from beta to GA](#slo-troubleshoot-beta) + + +## Understanding SLO internals [slo-understanding-slos] + +::::{tip} +If you’re already familiar with how SLOs work and their relationship with other system components, such as transforms and ingest pipelines, you can jump directly to [Common problems](#slo-common-problems). + +:::: + + +An SLO is represented by several system resources: + +* **SLO Definition**: Stored as a Kibana Saved Object. +* **Transforms**: For each SLO, {{kib}} creates two transforms: + + * **Rolling-up transform**: `slo-{slo.id}-{slo.revision}`, rolls up the data into a smaller set of documents. The source indices of this transform are defined by the SLO. The target index will be `.slo-observability.sli-v{slo.internal-version}-{monthly date}`. + * **Rolling-up ingest pipeline**: `slo-observability.sli.pipeline-{slo.id}-{slo.revision}`, used by the rolling-up transform. + * **Summarizing transform**: `slo-summary-{slo.id}-{slo.revision}`, updates the latest values, such as the observed SLI or remaining error budget, for efficient searching and filtering of SLOs. The source of this transform is `.slo-observability.sli-v{slo.internal-version}*`. The target index is `.slo-observability.summary-v{slo.internal-version}`. + * **Summarizing ingest pipeline**: `slo-observability.summary.pipeline-{slo.id}-{slo.revision}`, used by the summarizing transform. + +* **Additional resources**: {{kib}} also installs and manages shared resources to the SLOs, including index templates, indices, and ingest pipelines, among others. + +When an **SLO update** changes any of the `SLI parameters`, the `SLO objective`, or the `time window`, a revision bump (`{slo.revision}`) and a full reinstallation of the associated assets (transforms and ingest pipelines) occur. In addition, the revision bump deletes any previously aggregated data for that SLO. Updates to fields like `name`, `description`, or `tags` do not trigger a revision bump or asset reinstallation. + +Ensuring that transforms are functioning correctly and that the cluster is healthy is crucial for maintaining accurate and reliable SLOs. + + +## Common problems [slo-common-problems] + +It’s common for SLO problems to arise when there are underlying problems in the cluster, such as unavailable shards or failed transforms. Because SLOs rely on transforms to aggregate and process data, any failure or misconfiguration in these components can lead to inaccurate or incomplete SLO calculations. Additionally, unavailable shards can affect the data retrieval process, further complicating the reliability of SLO metrics. + + +### No transform or ingest nodes [slo-no-transform-ingest-node] + +Because SLOs depend on both [ingest pipelines](../../manage-data/ingest/transform-enrich/ingest-pipelines.md) and [transforms](../../explore-analyze/transforms.md) to process the data, it’s essential to ensure that the cluster has nodes with the appropriate [roles](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#node-roles). + +Ensure the cluster includes one or more nodes with both `ingest` and `transform` roles to support the data processing and transformations required for SLOs to function properly. The roles can exist on the same node or be distributed across separate nodes. + + +### Unhealthy or missing transforms [slo-transform-unhealthy] + +When working with SLOs, it is crucial to ensure that the associated transforms function correctly. Transforms are responsible for generating the data needed for SLOs, and two transforms are created for each SLO. If you notice that your SLOs are not displaying the expected data, it’s time to check the health of these associated transforms. + +{{kib}} shows the following message when any of the associated transforms is in an unexpected state: + +* `"The following transform is an unhealthy state"`, followed by a list of transforms. + +For detailed guidance on diagnosing and resolving transform-related issues, refer to the [troubleshooting transforms](../elasticsearch/transform-troubleshooting.md) documentation . + +It’s also recommended that you perform the following transform checks: + +* Ensure the transforms needed for the SLOs haven’t been deleted or stopped. + + If a transform has been deleted, the easiest way to recreate it is using the [Reset SLO](#slo-troubleshoot-reset) action, forcing the recreation of the transforms. If a transform was stopped, try to start it, and then check the `health tab` of the transform. + +* [Inspect SLO assets](#slo-troubleshoot-inspect) to analyze the SLO definition and all associated resources. + + Use the direct links offered by the **Inspect UI** and check that all referenced resources exist, as that’s not verified by the inspect functionality. + + Use the `query composite` content to verify if the queries performed by the transforms are valid and return the expected data. + +* Check the source data and queries of the SLO. + + The most common cause of legitimate transform failures is issues with the source data, such as timestamp parsing errors or incorrect query structures. The following is an example of an unparsable timestamp causing a transform to fail: + + ```bash + "reason": """Failed to index documents into destination index due to permanent error: + [org.elasticsearch.xpack.transform.transforms.BulkIndexingException: Bulk index experienced [500] failures and at least 1 irrecoverable + [unable to parse date [1702842480000]]. Other failures: + [IngestProcessorException] message [org.elasticsearch.ingest.IngestProcessorException: + java.lang.IllegalArgumentException: unable to parse date [1702842480000]]; java.lang.IllegalArgumentException: unable to parse date [1702842480000]]""", + "issue": "Transform task state is [failed]" + ``` + +* As a last resort, consider [resetting the SLO](#slo-troubleshoot-reset). + + +### Missing Ingest Pipelines [slo-missing-pipeline] + +If any of the needed ingest pipelines are missing, try the [Reset SLO](#slo-troubleshoot-reset) action. + + +### Stack-related problems [slo-missing-template] + +As mentioned, maintaining a healthy cluster is crucial for SLOs to function correctly. The following examples show issues **unrelated to SLOs** that can still disrupt their proper operation. While troubleshooting these issues is outside the scope of this document, they are included for illustrative purposes. + +* Problems accessing the source data, causing the transform to fail: + + ```bash + Failed to execute phase [can_match], start; org.elasticsearch.action.search.SearchPhaseExecutionException: + Search rejected due to missing shards [[index_name_1][1], [index_name_2][1], [index_name_3][1]]. + ``` + +* Remote cluster not available, if for example an SLO is fetching data from a remote cluster called `remote-metrics`: + + ```sh + Validation Failed: 1: no such remote cluster: [remote-metrics] + ``` + +* [Circuit breaker exceptions](../elasticsearch/circuit-breaker-errors.md) due to nodes being under memory pressure. + + +## SLO troubleshooting actions [slo-troubleshoot-actions] + + +### Inspect SLO assets [slo-troubleshoot-inspect] + +To be able to inspect SLOs you have to activate the corresponding feature in {{kib}}: + +1. Open **Advanced Settings**, by finding **Stack Management** in the main menu or using the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). +2. Enable `observability:enableInspectEsQueries` setting. + +Afterwards visit the **SLO edit page** and click **SLO Inspect**. + +The **SLO Inspect** option provides a detailed report of an SLO, including: + +* SLO configuration +* Rollup transform configuration +* Summary transform configuration +* Rollup ingest pipeline +* Summary ingest pipeline +* Temporary document +* Rollup transform query composite +* Summary transform query composite + +These resources are very useful for tasks such as trying out the queries performed by the transforms and checking the IDs of all associated resources. The view also includes direct links to transforms and ingest pipelines sections in {{kib}}. + + +### Reset SLO [slo-troubleshoot-reset] + +Resetting an SLO forces the deletion of all SLI data, summary data, and transforms, and then reinstalls and processes the data. Essentially, it recreates the SLO as if it had been deleted and re-created by the user. + +::::{note} +While resetting an SLO can help resolve certain issues, it may not always address the root cause of errors. Most errors related to transforms typically arise from improperly structured source data, such as unparsable timestamps, which prevent the transform from progressing. Additionally, incorrect formatted SLO queries, and consequently transform queries, can also lead to failures. + +Before resetting the SLO, verify that the source data and queries are correctly formatted and validated. Resetting should only be used as a last resort when all other troubleshooting steps have been exhausted. + +:::: + + +Follow these steps to reset an SLO: + +1. Find **SLOs** in the main menu or use the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md). +2. Click on the SLO to reset. +3. Select **Actions** → **Reset**. + +Alternatively you can use {{kib}} API for the reset action: + +```console +POST kbn:/api/observability/slos/{sloId}∫/_reset +``` + +Where `sloId` can be obtained from the [Inspect SLO assets](#slo-troubleshoot-inspect) action. + + +### Using API calls to retrieve SLO details [slo-api-calls] + +Refer to [SLO API calls](https://www.elastic.co/docs/api/doc/kibana/v8/operation/operation-findslosop) as an alternative to [using SLO Inspect](#slo-troubleshoot-inspect). + + +## Upgrade from beta to GA [slo-troubleshoot-beta] +:::{applies} +:stack: all +::: + +Starting in version 8.12.0, SLOs are generally available (GA). If you’re upgrading from a beta version of SLOs (available in 8.11.0 and earlier), you must migrate your SLO definitions to a new format. Otherwise SLOs won’t show up. + +::::{dropdown} Migrate your SLO definitions +To migrate your SLO definitions, open the SLO overview. A banner will display the number of outdated SLOs detected. For each outdated SLO, click **Reset**. If you no longer need the SLO, select **Delete**. + +If you have a large number of SLO definitions, it is possible to automate this process. To do this, you’ll need to use two Elastic APIs: + +* [SLO Definitions Find API](https://github.com/elastic/kibana/blob/9cb830fe9a021cda1d091effbe3e0cd300220969/x-pack/plugins/observability/docs/openapi/slo/bundled.yaml#L453-L514) (`/api/observability/slos/_definitions`) +* [SLO Reset API](https://www.elastic.co/docs/api/doc/kibana/v8/operation/operation-resetsloop) + +Pass in `includeOutdatedOnly=1` as a query parameter to the Definitions Find API. This will display your outdated SLO definitions. Loop through this list, one by one, calling the Reset API on each outdated SLO definition. The Reset API loads the outdated SLO definition and resets it to the new format required for GA. Once an SLO is reset, it will start to regenerate SLIs and summary data. + +:::: + + +::::{dropdown} Remove legacy summary transforms +After migrating to 8.12 or later, you might have some legacy SLO summary transforms running. You can safely delete the following legacy summary transforms: + +```sh +# Stop all legacy summary transforms +POST _transform/slo-summary-occurrences-30d-rolling/_stop?force=true +POST _transform/slo-summary-occurrences-7d-rolling/_stop?force=true +POST _transform/slo-summary-occurrences-90d-rolling/_stop?force=true +POST _transform/slo-summary-occurrences-monthly-aligned/_stop?force=true +POST _transform/slo-summary-occurrences-weekly-aligned/_stop?force=true +POST _transform/slo-summary-timeslices-30d-rolling/_stop?force=true +POST _transform/slo-summary-timeslices-7d-rolling/_stop?force=true +POST _transform/slo-summary-timeslices-90d-rolling/_stop?force=true +POST _transform/slo-summary-timeslices-monthly-aligned/_stop?force=true +POST _transform/slo-summary-timeslices-weekly-aligned/_stop?force=true + +# Delete all legacy summary transforms +DELETE _transform/slo-summary-occurrences-30d-rolling?force=true +DELETE _transform/slo-summary-occurrences-7d-rolling?force=true +DELETE _transform/slo-summary-occurrences-90d-rolling?force=true +DELETE _transform/slo-summary-occurrences-monthly-aligned?force=true +DELETE _transform/slo-summary-occurrences-weekly-aligned?force=true +DELETE _transform/slo-summary-timeslices-30d-rolling?force=true +DELETE _transform/slo-summary-timeslices-7d-rolling?force=true +DELETE _transform/slo-summary-timeslices-90d-rolling?force=true +DELETE _transform/slo-summary-timeslices-monthly-aligned?force=true +DELETE _transform/slo-summary-timeslices-weekly-aligned?force=true +``` + +Do not delete any new summary transforms used by your migrated SLOs. + +:::: diff --git a/troubleshoot/observability/troubleshooting-synthetics.md b/troubleshoot/observability/troubleshooting-synthetics.md index 661384a6f3..0483a4c424 100644 --- a/troubleshoot/observability/troubleshooting-synthetics.md +++ b/troubleshoot/observability/troubleshooting-synthetics.md @@ -5,17 +5,149 @@ mapped_pages: - https://www.elastic.co/guide/en/serverless/current/observability-synthetics-troubleshooting.html --- -# Troubleshooting Synthetics +# Troubleshoot Synthetics [synthetics-troubleshooting] -% What needs to be done: Align serverless/stateful -% Use migrated content from existing pages that map to this page: -% - [ ] ./raw-migrated-files/observability-docs/observability/synthetics-troubleshooting.md -% - [ ] ./raw-migrated-files/docs-content/serverless/observability-synthetics-troubleshooting.md +## Local debugging [synthetics-troubleshooting-local-debugging] -% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc): +For debugging synthetic tests locally, you can set an environment variable, `DEBUG=synthetics`, to capture Synthetics agent logs when using the [Synthetics CLI](/solutions/observability/apps/use-synthetics-cli.md). -$$$synthetics-troubleshooting-no-locations$$$ -$$$synthetics-troubleshooting-public-locations-disabled$$$ \ No newline at end of file +## Common issues [synthetics-troubleshooting-common-issues] + + +### Monitors stopped running after upgrading to 8.8.0 or above [synthetics-troubleshooting-missing-api-key] +:::{applies} +:stack: all +::: + +Synthetic monitors will stop running if you have gone through this workflow: + +1. Enabled Monitor Management (in the {{uptime-app}}) prior to 8.6.0. +2. Created a synthetic monitor that is configured to run on Elastic’s global managed infrastructure. +3. Upgraded to 8.8.0 or above. + +This happens because the permissions granted by clicking **Enable Monitor Management** in versions prior to 8.6.0 are not sufficient in versions 8.8.0 and above. + +To fix this, a user with [admin permissions](/solutions/observability/apps/setup-role.md) needs to visit the {{synthetics-app}} in {{kib}}. In 8.8.0 and above, the equivalent of "enabling monitor management" happens automatically in the background when a user with [admin permissions](/solutions/observability/apps/setup-role.md) visits the {{synthetics-app}}. + +If a user *without* [admin permissions](/solutions/observability/apps/setup-role.md) visits the {{synthetics-app}} before an admin has visited it, the user will see a note that says "Only administrators can enable this feature". That note will persist until an admin user visits the {{synthetics-app}}. + + +### No results from a monitor configured to run on a {{private-location}} [synthetics-troubleshooting-no-agent-running] + +If you have created a {{private-location}} and configured a monitor to run on that {{private-location}}, but don’t see any results for that monitor in the {{synthetics-app}}, make sure there is an agent configured to run against the agent policy. + +::::{note} +If you attempt to assign an agent policy to a {{private-location}} *before* configuring an agent to run against the agent policy, you will see a note in the {{synthetics-app}} UI that the selected agent policy has no agents. + +:::: + + +When creating a {{private-location}}, you have to: + +::::{tab-set} + +:::{tab-item} {{serverless-short}} +1. [Set up {{agent}}](/solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-fleet-agent). +2. [Connect {{fleet}} to your Observability project](/solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-connect) and enroll an {{agent}} in {{fleet}}. +3. [Add a {{private-location}}](/solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-add) in the Synthetics UI. +::: + +:::{tab-item} {{stack}} 9.0.0+ +1. [Set up {{fleet-server}} and {{agent}}](/solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-fleet-agent). +2. [Connect {{fleet}} to the {{stack}}](/solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-connect) and enroll an {{agent}} in {{fleet}}. +3. [Add a {{private-location}}](/solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-add) in the {{synthetics-app}}. +::: + +:::: + +If you do not complete the second item, no agents will be configured to run against the agent policy, and any monitors configured to run on that {{private-location}} won’t be able to run so there will be no results in the {{synthetics-app}}. + +To fix this, make sure there is an agent configured to run against the agent policy. + + +### No results from a monitor [synthetics-troubleshooting-no-direct-es-connection] + +If you have configured a monitor but don’t see any results for that monitor in the {{synthetics-app}}, whether running them from Elastic’s global managed testing infrastructure or from {{private-location}}s, ensure Synthetics has a direct connection to {{es}}. + +Do not configure any ingest pipelines or output via Logstash as this will prevent Synthetics from working properly and is not [supported](/solutions/observability/apps/synthetics-support-matrix.md). + + +### Browser monitor configured to run on a {{private-location}} not running to schedule [synthetics-troubleshooting-missing-browser-schedules] + +If you have browser monitors configured to run on a {{private-location}} but notice one or more of them are not running as scheduled, this could be because: + +* The time it takes for your monitor to run is longer than the frequency you have set +* There may be too many monitors trying to run concurrently, causing some of them to skip their scheduled run + +You may also see a message in the logs such as `2 tasks have missed their schedule deadlines by more than 1 second in the last 15s`. These will be visible from inside the Agent diagnostic ZIP file, and the numbers and time periods may be different in your logs. + +Start by identifying the cause of the issue. First, check if the time it takes the monitor to run is less than the scheduled frequency: + +1. Go to the {{synthetics-app}}. +2. Click the monitor, then click **Go to monitor**. +3. Go to the [Overview tab](/solutions/observability/apps/analyze-data-from-synthetic-monitors.md#synthetics-analyze-individual-monitors-overview) to see the *Avg. duration*. You can also view the duration for individual runs in the [History tab](/solutions/observability/apps/analyze-data-from-synthetic-monitors.md#synthetics-analyze-individual-monitors-history). +4. Compare the duration to the scheduled frequency. If the duration is *greater than* the scheduled frequency, for example if the monitor that takes 90 seconds to run and its scheduled frequency is 1 minute, the next scheduled run will not occur because the current one is still running so you may see results for every other scheduled run. + + To fix this, you can either: + + * Change the frequency so the monitor runs less often. + * Refactor the monitor so it can run in a shorter amount of time. + + +If the duration is *less than* the scheduled frequency or the suggestion above does not fix the issue, then there may be too many browser monitors attempting to run on the {{private-location}}. Due to the additional hardware overhead of running browser monitors, we limit each {{private-location}} to only run two browser monitors at the same time. Depending on how many browser monitors you have configured to run on the {{private-location}} and their schedule, the {{private-location}} may not be able to run them all because it would require more than two browser tests to be running simultaneously. + +To fix this issue, you can either: + +* Increase the number of concurrent browser monitors allowed (as described in [Scaling Private Locations](/solutions/observability/apps/monitor-resources-on-private-networks.md#synthetics-private-location-scaling)), paying attention to the scaling and hardware requirements documented. +* Create multiple {{private-location}}s and spread your browser monitors across them more evenly (effectively horizontally scaling your {{private-location}}s). + + +### No locations are available [synthetics-troubleshooting-no-locations] + +When using {{ecloud}}, if there are no options available in the *Locations* dropdown when you try to create a monitor in the {{synthetics-app}} *or* if no locations are listed when using the [`location` command](/solutions/observability/apps/use-synthetics-cli.md#elastic-synthetics-locations-command), it might be because you do not have permission to use Elastic managed locations *and* there are no [Private Locations](/solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent) available yet. + +There are a few ways to fix this: + +::::{tab-set} + +:::{tab-item} {{serverless-short}} +* If you have [Editor](/solutions/observability/apps/grant-users-access-to-secured-resources.md) access, you can [create a new Private Location](/solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent). Then try creating the monitor again. +* If you do *not* have the right privileges to create a Private Location, you can ask an [Admin](/solutions/observability/apps/grant-users-access-to-secured-resources.md) to create a Private Location or give you the necessary privileges so you can [create a new Private Location](/solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent). Then try creating the monitor again. + +::: + +:::{tab-item} {{stack}} 9.0.0+ +* If you have [write access](/solutions/observability/apps/writer-role.md) including the privileges for [creating new Private Locations](/solutions/observability/apps/writer-role.md#synthetics-role-write-private-locations), you can [create a new Private Location](/solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent). Then try creating the monitor again. +* If you do *not* have the right privileges to create a Private Location, you can ask someone with the [necessary privileges](/solutions/observability/apps/writer-role.md#synthetics-role-write-private-locations) to create a Private Location or ask an administrator with a [setup role](/solutions/observability/apps/setup-role.md) to give you the necessary privileges and [create a new Private Location](/solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent). Then try creating the monitor again. +* If you want to create a monitor to run on Elastic’s global managed infrastructure, ask an administrator with a [setup role](/solutions/observability/apps/setup-role.md) to update [`Synthetics and Uptime` sub-feature privileges](/solutions/observability/apps/writer-role.md#disable-managed-locations) for the role you’re currently assigned. Then try creating the monitor again. +::: + +:::: + +### You do not have permission to use Elastic managed locations [synthetics-troubleshooting-public-locations-disabled] +:::{applies} +:stack: all +::: + +If you try to create or edit a monitor hosted on Elastic’s global managed infrastructure but see a note that you do not have permission to use Elastic managed locations, an administrator has restricted the use of public locations. + +To fix this you can either: + +* Ask an administrator with a [setup role](/solutions/observability/apps/setup-role.md) to update [`Synthetics and Uptime` sub-feature privileges](/solutions/observability/apps/writer-role.md#disable-managed-locations) for the role you’re currently assigned or assign you a role that allows using Elastic’s global managed infrastructure. +* Use a [Private Location](/solutions/observability/apps/monitor-resources-on-private-networks.md#monitor-via-private-agent). + + +## Get help [synthetics-troubleshooting-get-help] + + +### Elastic Support [synthetics-troubleshooting-support] + +We offer a support experience unlike any other. Our team of professionals *speak human and code* and love making your day. [Learn more about subscriptions](https://www.elastic.co/subscriptions). + + +### Discussion forum [synthetics-troubleshooting-discussion] + +For other questions and feature requests, visit our [discussion forum](https://discuss.elastic.co//c/observability/synthetics/75).