diff --git a/_images/get-started/o11y_onboardingGuideFlow_1-onboarding.svg b/_images/get-started/o11y_onboardingGuideFlow_1-onboarding.svg new file mode 100644 index 000000000..1f28b91ae --- /dev/null +++ b/_images/get-started/o11y_onboardingGuideFlow_1-onboarding.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/_images/get-started/o11y_onboardingGuideFlow_2-initial.svg b/_images/get-started/o11y_onboardingGuideFlow_2-initial.svg new file mode 100644 index 000000000..105c7afb8 --- /dev/null +++ b/_images/get-started/o11y_onboardingGuideFlow_2-initial.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/_images/get-started/o11y_onboardingGuideFlow_3-scaled.svg b/_images/get-started/o11y_onboardingGuideFlow_3-scaled.svg new file mode 100644 index 000000000..6f1242d68 --- /dev/null +++ b/_images/get-started/o11y_onboardingGuideFlow_3-scaled.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/_images/get-started/o11y_onboardingGuideFlow_full-flow.svg b/_images/get-started/o11y_onboardingGuideFlow_full-flow.svg new file mode 100644 index 000000000..49529c395 --- /dev/null +++ b/_images/get-started/o11y_onboardingGuideFlow_full-flow.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/_images/get-started/onboarding-guide-2point0-flowonly.svg b/_images/get-started/onboarding-guide-2point0-flowonly.svg deleted file mode 100644 index 418d2431f..000000000 --- a/_images/get-started/onboarding-guide-2point0-flowonly.svg +++ /dev/null @@ -1 +0,0 @@ - \ No newline at end of file diff --git a/_images/get-started/onboarding-guide-2point0-initial.svg b/_images/get-started/onboarding-guide-2point0-initial.svg deleted file mode 100644 index 2c50cc1d2..000000000 --- a/_images/get-started/onboarding-guide-2point0-initial.svg +++ /dev/null @@ -1 +0,0 @@ - \ No newline at end of file diff --git a/_images/get-started/onboarding-guide-2point0-readiness.svg b/_images/get-started/onboarding-guide-2point0-readiness.svg deleted file mode 100644 index b7ee89661..000000000 --- a/_images/get-started/onboarding-guide-2point0-readiness.svg +++ /dev/null @@ -1 +0,0 @@ - \ No newline at end of file diff --git a/_images/get-started/onboarding-guide-2point0-scaled.svg b/_images/get-started/onboarding-guide-2point0-scaled.svg deleted file mode 100644 index eaa255241..000000000 --- a/_images/get-started/onboarding-guide-2point0-scaled.svg +++ /dev/null @@ -1 +0,0 @@ - \ No newline at end of file diff --git a/gdi/monitors-databases/exec-input.rst b/gdi/monitors-databases/exec-input.rst index 3bbd292d9..135fdcb1b 100644 --- a/gdi/monitors-databases/exec-input.rst +++ b/gdi/monitors-databases/exec-input.rst @@ -10,7 +10,7 @@ Exec Input (deprecated) The Exec Input monitor is now deprecated and will reach of End of Support on February 3, 2025. During this period only critical security and bug fixes are provided. When End of Support is reached, the monitor will be removed and no longer be supported, and you won't be able to use it to send data to Splunk Observability Cloud. - To monitor your system with Telegraf Exec you can use native OpenTelemetry instead. See :ref:`telegraf-generic` to learn how. + To collect exec file data use the OpenTelemetry Collector and the :new-page:`Telegraf Exec Input plugin `. See how in :ref:`telegraf-generic`. The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the Exec Input monitor type, an embedded form of the Telegraf Exec plugin, to receive metrics or logs from exec files. diff --git a/gdi/monitors-databases/logparser.rst b/gdi/monitors-databases/logparser.rst index df6627bcd..6fca7e7ac 100644 --- a/gdi/monitors-databases/logparser.rst +++ b/gdi/monitors-databases/logparser.rst @@ -7,6 +7,8 @@ Logparser :description: Use this Splunk Observability Cloud integration for the telegraf/logparser plugin monitor. See benefits, install, configuration, and metrics. +.. caution:: Smart Agent monitors are being deprecated. To tail log files use the OpenTelemetry Collector and the :new-page:`Telegraf Tail Input plugin `. See how in :ref:`telegraf-generic`. + The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the ``telegraf/logparser`` monitor type to tail log files. This integration is based on the Telegraf logparser plugin, and all diff --git a/gdi/monitors-databases/sql.rst b/gdi/monitors-databases/sql.rst index 49b5b5788..219079bcc 100644 --- a/gdi/monitors-databases/sql.rst +++ b/gdi/monitors-databases/sql.rst @@ -6,7 +6,7 @@ SQL .. meta:: :description: Use this Splunk Observability Cloud integration for the SQL monitor. See benefits, install, configuration, and metrics -.. note:: If you're using the Splunk Distribution of the OpenTelemetry Collector and want to collect SQL metrics, use the native OTel :ref:`sqlquery-receiver` component. +.. caution:: Smart Agent monitors are being deprecated. To collect SQL metrics use the native OpenTelemetry :ref:`sqlquery-receiver` component. The SQL monitor gathers database usage metrics from SQL queries on your databases. It's available for Kubernetes, Windows, and Linux. diff --git a/gdi/monitors-hosts/procstat.rst b/gdi/monitors-hosts/procstat.rst index f4028e69c..4e0a2f17b 100644 --- a/gdi/monitors-hosts/procstat.rst +++ b/gdi/monitors-hosts/procstat.rst @@ -6,7 +6,9 @@ procstat .. meta:: :description: Use this Splunk Observability Cloud integration for the procstat monitor. See benefits, install, configuration, and metrics -The Splunk Distribution of OpenTelemetry Collector uses the Smart Agent receiver with the +.. caution:: Smart Agent monitors are being deprecated. To collect metrics about processes use the OpenTelemetry Collector and the :new-page:`Telegraf Procstat Input plugin `. See how in :ref:`telegraf-generic`. + +The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the ``procstat`` monitor type to collect metrics about processes. This integration is available for Kubernetes, Linux, and Windows. diff --git a/gdi/monitors-hosts/win-services.rst b/gdi/monitors-hosts/win-services.rst index 8e7a10e30..5dceedb21 100644 --- a/gdi/monitors-hosts/win-services.rst +++ b/gdi/monitors-hosts/win-services.rst @@ -6,7 +6,9 @@ Windows Services .. meta:: :description: Use this Splunk Observability Cloud integration for the Telegraf Win_services monitor. See benefits, install, configuration, and metrics -The Splunk Distribution of OpenTelemetry Collector uses the Smart Agent receiver with the +.. caution:: Smart Agent monitors are being deprecated. To collect Windows service data use the OpenTelemetry Collector and the :new-page:`Telegraf Windows Services Input plugin `. See how in :ref:`telegraf-generic`. + +The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the ``telegraf/win_services`` monitor type to ingest metrics about Windows services. diff --git a/gdi/monitors-languages/asp-dot-net.rst b/gdi/monitors-languages/asp-dot-net.rst index 7157d7076..a134e5d6b 100644 --- a/gdi/monitors-languages/asp-dot-net.rst +++ b/gdi/monitors-languages/asp-dot-net.rst @@ -12,6 +12,8 @@ ASP.NET (deprecated) To forward data from a .NET application to Splunk Observability Cloud use the :ref:`Splunk Distribution of OpenTelemetry .NET ` instead. + To monitor Windows Performance Counters with native OpenTelemetry refer to :ref:`windowsperfcounters-receiver`. + The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the ``aspdotnet`` monitor type to retrieve metrics for requests, errors, sessions, and worker processes from ASP.NET applications. diff --git a/gdi/monitors-languages/microsoft-dotnet.rst b/gdi/monitors-languages/microsoft-dotnet.rst index e7e439ea2..b9759c0b0 100644 --- a/gdi/monitors-languages/microsoft-dotnet.rst +++ b/gdi/monitors-languages/microsoft-dotnet.rst @@ -6,9 +6,11 @@ Microsoft .NET (deprecated) .. meta:: :description: Use this Splunk Observability Cloud integration for the .NET (dotnet) apps monitor. See benefits, install, configuration, and metrics -.. note:: This integration is deprecated and will be removed in February 2025. To forward data to Splunk Observability Cloud, use the Splunk Distribution of OpenTelemetry .NET. For a full list of collected metrics, refer to :ref:`dotnet-otel-metrics-attributes`. +.. caution:: + + This integration is deprecated and will be removed in February 2025. To forward data to Splunk Observability Cloud, use the Splunk Distribution of OpenTelemetry .NET. For a full list of collected metrics, refer to :ref:`dotnet-otel-metrics-attributes`. -The Splunk Distribution of OpenTelemetry Collector uses the Smart Agent receiver with the +The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the ``dotnet`` monitor type to report metrics for .NET applications. This integration is only available on Windows. diff --git a/gdi/monitors-network/dns.rst b/gdi/monitors-network/dns.rst index 4e0e2c382..0136fa670 100644 --- a/gdi/monitors-network/dns.rst +++ b/gdi/monitors-network/dns.rst @@ -6,9 +6,9 @@ DNS Query Input .. meta:: :description: Use this Splunk Observability Cloud integration for the Telegraf DNS monitor. See benefits, install, configuration, and metrics -The Splunk Distribution of OpenTelemetry Collector uses the Smart Agent receiver with the -DNS Query Input monitor type (an embedded form of the Telegraf DNS Query -plugin) to collect DNS data. +.. caution:: Smart Agent monitors are being deprecated. To collect DNS data use the OpenTelemetry Collector and the :new-page:`Telegraf DNS Query Input plugin `. See how in :ref:`telegraf-generic`. + +You can use the Splunk Distribution of the OpenTelemetry Collector's Smart Agent receiver with the DNS Query Input monitor type (an embedded form of the Telegraf DNS Query plugin) to collect DNS data. Benefits -------- diff --git a/gdi/monitors-network/snmp.rst b/gdi/monitors-network/snmp.rst index b4410117b..aad1ad94c 100644 --- a/gdi/monitors-network/snmp.rst +++ b/gdi/monitors-network/snmp.rst @@ -6,7 +6,9 @@ SNMP agent .. meta:: :description: Use this Splunk Observability Cloud integration for the SNMP agent monitor. See benefits, install, configuration, and metrics -The Splunk Distribution of OpenTelemetry Collector uses the Smart Agent receiver with the +.. caution:: Smart Agent monitors are being deprecated. To collect data from SNMP agents use the OpenTelemetry Collector and the :new-page:`Telegraf SNMP Input plugin `. See how in :ref:`telegraf-generic`. + +The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the ``snmp`` monitor type to collect metrics from SNMP agents. This integration is available for Kubernetes, Windows, and Linux. diff --git a/gdi/opentelemetry/otel-other/telegraf.rst b/gdi/opentelemetry/otel-other/telegraf.rst index 94fe139d3..89bb9962c 100644 --- a/gdi/opentelemetry/otel-other/telegraf.rst +++ b/gdi/opentelemetry/otel-other/telegraf.rst @@ -1,14 +1,13 @@ .. _telegraf: .. _telegraf-generic: -Monitor services with Telegraf and OpenTelemetry -======================================================== +Monitor services with Telegraf Input plugins and OpenTelemetry +===================================================================== .. meta:: :description: Use this Splunk Observability Cloud integration for the Telegraf monitor. See benefits, install, configuration, and metrics. -To monitor your service with Telegraf using native OpenTelemetry in Splunk Observability Cloud, install the service's Telegraf plugin then push metrics to the Splunk Opentelemetry Collector -via OTLP. +To monitor your service with Telegraf using native OpenTelemetry in Splunk Observability Cloud, install the service's Telegraf Input plugin then push metrics to the Splunk Opentelemetry Collector via OTLP. .. note:: This setup is designed for a Linux Ubuntu OS but should be replicable on any machines running Linux OS with Debian flavor. These instructions might not work on other OS (MacOS/Windows). @@ -45,9 +44,9 @@ Run the following commands to install Telegraf from the InfluxData repository: 2. Set up your service's Telegraf Input plugin ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Next, install the Telegraf Input plugin for the service you want to monitor. Available plugins include Chrony, Consul, Docker, Elasticsearch, Fluentd, GitHub, Jenkins, RabbitMQ or SQL. Find a complete list of Input plugins at :new-page:`Telegraf Input plugins ` in GitHub. +Next, install the Telegraf Input plugin for the service you want to monitor. Available plugins include Chrony, Consul, Docker, Elasticsearch, Fluentd, GitHub, Jenkins, RabbitMQ or SQL. Find a complete list of available plugins at :new-page:`Telegraf Input plugins ` in GitHub. -For example, if you want to monitor execute commands on every interval and parse metrics from their output with the exec input plugin, use a setup like: +For example, if you want to monitor execute commands on every interval and parse metrics from their output with the Exec Input plugin, use a setup like: .. code:: diff --git a/gdi/opentelemetry/smart-agent/smart-agent-migration-to-otel-collector.rst b/gdi/opentelemetry/smart-agent/smart-agent-migration-to-otel-collector.rst index 922384b8a..4e25cf1d3 100644 --- a/gdi/opentelemetry/smart-agent/smart-agent-migration-to-otel-collector.rst +++ b/gdi/opentelemetry/smart-agent/smart-agent-migration-to-otel-collector.rst @@ -25,7 +25,7 @@ Migrate from SignalFx Smart Agent to the Splunk Distribution of OpenTelemetry Co Smart Agent monitors are also being deprecated and will no longer be available to send data to Splunk Observability Cloud when they reach End of Support. Instead, you can use native OpenTelemetry receivers to gather data with the OTel Collector. See :ref:`migration-monitors-native`. -The Splunk Distribution of the :new-page:`OpenTelemetry Collector ` provides a unified way to receive, process, and export metrics, traces, and logs to Splunk Observability Cloud. If you're using the SignalFx Smart Agent (deprecated) you can easily transition to the Collector without losing any functionality. +The Splunk Distribution of the OpenTelemetry Collector provides a unified way to receive, process, and export metrics, traces, and logs to Splunk Observability Cloud. If you're using the SignalFx Smart Agent (End Of Support) you must transition to the Collector. .. raw:: html diff --git a/get-started/get-started-guide/get-started-guide.rst b/get-started/get-started-guide/get-started-guide.rst index ab69115c2..0272e2cf7 100644 --- a/get-started/get-started-guide/get-started-guide.rst +++ b/get-started/get-started-guide/get-started-guide.rst @@ -27,9 +27,9 @@ The journey for getting started with Splunk Observability Cloud has 3 phases: on .. note:: This guide is for Splunk Observability Cloud users with the admin role. -.. image:: /_images/get-started/onboarding-guide-2point0-flowonly.svg +.. image:: /_images/get-started/o11y_onboardingGuideFlow_full-flow.svg :width: 100% - :alt: . + :alt: Flow showing the 3 phases of the get started journey: onboarding, initial rollout, and scaled rollout. .. list-table:: :header-rows: 1 diff --git a/get-started/get-started-guide/initial-rollout.rst b/get-started/get-started-guide/initial-rollout.rst index 0f4e4a4ef..9ae599014 100644 --- a/get-started/get-started-guide/initial-rollout.rst +++ b/get-started/get-started-guide/initial-rollout.rst @@ -9,9 +9,9 @@ To get a high-level overview of the entire getting started journey for Splunk Ob .. note:: This guide is for Splunk Observability Cloud users with the admin role. -.. image:: /_images/get-started/onboarding-guide-2point0-initial.svg +.. image:: /_images/get-started/o11y_onboardingGuideFlow_2-initial.svg :width: 100% - :alt: + :alt: Flow showing the 3 phases of the get started journey: onboarding, initial rollout, and scaled rollout. The initial rollout phase is highlighted in this initial rollout topic. To configure Splunk Observability Cloud solutions for initial rollout, complete the following tasks if they are relevant to your organization: diff --git a/get-started/get-started-guide/onboarding-readiness.rst b/get-started/get-started-guide/onboarding-readiness.rst index 49baea1c1..7f478f8e0 100644 --- a/get-started/get-started-guide/onboarding-readiness.rst +++ b/get-started/get-started-guide/onboarding-readiness.rst @@ -10,9 +10,9 @@ To get a high-level overview of the entire getting started journey, see :ref:`ge .. note:: This guide is for Splunk Observability Cloud users with the admin role. -.. image:: /_images/get-started/onboarding-guide-2point0-readiness.svg +.. image:: /_images/get-started/o11y_onboardingGuideFlow_1-onboarding.svg :width: 100% - :alt: + :alt: Flow showing the 3 phases of the get started journey: onboarding, initial rollout, and scaled rollout. The onboarding phase is highlighted in this onboarding topic. To configure your users, teams, and tokens, complete the following primary tasks: diff --git a/get-started/get-started-guide/scaled-rollout.rst b/get-started/get-started-guide/scaled-rollout.rst index 1e45fbd7c..5a0ae26c1 100644 --- a/get-started/get-started-guide/scaled-rollout.rst +++ b/get-started/get-started-guide/scaled-rollout.rst @@ -10,9 +10,9 @@ To get a high-level overview of the entire getting started journey for Splunk Ob .. note:: This guide is for Splunk Observability Cloud users with the admin role. -.. image:: /_images/get-started/onboarding-guide-2point0-scaled.svg +.. image:: /_images/get-started/o11y_onboardingGuideFlow_3-scaled.svg :width: 100% - :alt: + :alt: Flow showing the 3 phases of the get started journey: onboarding, initial rollout, and scaled rollout. The scaled rollout phase is highlighted in this scaled rollout topic. To increase usage across all user teams and establish repeatable observability practices through automation, data management, detectors, and dashboards, complete the following tasks: diff --git a/scenarios-tutorials/scenario-collector.rst b/scenarios-tutorials/scenario-collector.rst index f681df8fd..5c9857e1b 100644 --- a/scenarios-tutorials/scenario-collector.rst +++ b/scenarios-tutorials/scenario-collector.rst @@ -13,15 +13,22 @@ PonyBank uses Splunk Observability Cloud, which brings data in through the open- To instrument their infrastructure using the Splunk OTel Collector, Kai takes the following steps: -#. :ref:`set-up-eks-monitoring` -#. :ref:`instrument-ec2-instances` -#. :ref:`instrument-java-svc` -#. :ref:`related-content-use-case` - -.. _set-up-eks-monitoring: - -Enable EKS monitoring using custom Helm charts -============================================================ +.. raw:: html + + +
    +
  1. Enable EKS monitoring using custom Helm charts
  2. +
  3. Use the Collector to instrument all EC2 instances
  4. +
  5. Instrument the Java service for Splunk APM
  6. +
  7. Explore links between telemetry using Related Content
  8. +
+ + +.. raw:: html + + +

Enable EKS monitoring using custom Helm charts

+ Since their migration to the cloud, the PonyBank application has been running in EKS. Kai starts by setting up the cloud integration from Splunk Observability Cloud using the guided setup, which they access from the home page. Guided setups allow Kai to select the relevant ingest token, and generate installation commands and configuration snippets from the selected options, which Kai can use to quickly deploy instrumentation. @@ -35,10 +42,11 @@ At the end of the guided setup, Kai enters the Kubernetes map of Infrastructure .. image:: /_images/collector/image1.png :alt: Cluster view of the Kubernetes infrastructure in Infrastructure Monitoring -.. _instrument-ec2-instances: - -Use the Collector to instrument all EC2 instances -============================================================ +.. raw:: html + + +

Use the Collector to instrument all EC2 instances

+ For the hosts managed by IT as Elastic Compute Cloud (EC2) instances, Kai decides to deploy the Splunk OTel Collector using the existing Puppet setup at PonyBank. They open the guided setup for Linux monitoring in Splunk Observability Cloud and select the Puppet tab. After filling out the required information, Kai only has to follow two steps: @@ -66,10 +74,11 @@ At the same time, Kai can also see logs coming from each host and node in Splunk .. image:: /_images/collector/image6.png :alt: Log Observer showing host logs -.. _instrument-java-svc: - -Instrument the Java service for Splunk APM -====================================================================================== +.. raw:: html + + +

Instrument the Java service for Splunk APM

+ Kai's final goal is to instrument the corporate Java service of PonyBank for Splunk APM, so that the team can analyze spans and traces in Splunk Observability Cloud, as well as use AlwaysOn Profiling to quickly identify inefficient code that's using too much CPU or memory. @@ -85,10 +94,11 @@ For the EC2 instances that also contain Java services, Kai uses the same guided .. image:: /_images/collector/install-java-agent.gif :alt: Console output of the Java agent install -.. _related-content-use-case: - -Explore links between telemetry using Related Content -===================================================================================== +.. raw:: html + + +

Explore links between telemetry using Related Content

+ Thanks to the Related Content feature, when Kai selects the node running the checkout service of the application, the service appears as a link to Splunk APM in the related content bar. @@ -100,13 +110,19 @@ The same happens when Kai opens Splunk APM and selects the checkout service in t .. image:: /_images/collector/image4.png :alt: Application Monitoring showing the Related Content bar -Summary -================== +.. raw:: html + + +

Summary

+ Kai used Splunk OTel Collector to instrument PonyBank's entire cloud infrastructure, quickly obtaining configuration files and commands for each environment and situation. Through the Java instrumentation for APM, they also retrieved traces from the Java services running on the EKS clusters with related content available to access. -Learn more -================= +.. raw:: html + + +

Learn more

+ - Learn about sending data to Splunk Observability Cloud in :ref:`get-started-get-data-in`. - To collect infrastructure metrics and logs from multiple platforms, see :ref:`otel-intro`. diff --git a/scenarios-tutorials/scenario-landing.rst b/scenarios-tutorials/scenario-landing.rst index d5ce89f77..dac8326ff 100644 --- a/scenarios-tutorials/scenario-landing.rst +++ b/scenarios-tutorials/scenario-landing.rst @@ -27,6 +27,8 @@ This is the collection of scenarios available for Splunk Observability Cloud. Us - :ref:`scenario-security` * - :ref:`OpenTelemetry ` - :ref:`otel-collector-scenario` + * - :ref:`OpenTelemetry ` + - :ref:`deployments-fargate-java` * - :ref:`Alerts and detectors ` - :ref:`monitor-server-latency` * - :ref:`Alerts and detectors ` @@ -35,8 +37,16 @@ This is the collection of scenarios available for Splunk Observability Cloud. Us - :ref:`find-detectors` * - :ref:`Alerts and detectors ` - :ref:`troubleshoot-noisy-detectors` + * - :ref:`Alerts and detectors ` + - :ref:`min-delay-detectors-scenario` + * - :ref:`Alerts and detectors ` + - :ref:`max-delay-detectors-scenario` + * - :ref:`SLOs ` + - :ref:`custom-metric-slo-scenario` * - :ref:`APM ` - :ref:`service-map` + * - :ref:`APM ` + - :ref:`specific-trace` * - :ref:`APM ` - :ref:`services-impact-business-workflows` * - :ref:`APM ` @@ -45,6 +55,8 @@ This is the collection of scenarios available for Splunk Observability Cloud. Us - :ref:`troubleshoot-business-workflows` * - :ref:`APM ` - :ref:`apm-scenario-trace-analyzer` + * - :ref:`APM ` + - :ref:`apm-scenario-trace-analyzer-trace-duration` * - :ref:`APM ` - :ref:`monitor-services` * - :ref:`APM ` @@ -65,10 +77,12 @@ This is the collection of scenarios available for Splunk Observability Cloud. Us - :ref:`profiling-scenario` * - :ref:`APM Profiling ` - :ref:`memory-profiling-scenario` - * - :ref:`Infrastructure Monitoring ` + * - :ref:`Infrastructure Monitoring ` - :ref:`troubleshoot-k8s-nav-scenario` - * - :ref:`Infrastructure Monitoring ` + * - :ref:`Infrastructure: Metrics pipeline management ` - :ref:`aggregate-drop-use-case` + * - :ref:`Infrastructure: Metrics pipeline management ` + - :ref:`use-case-archive` * - :ref:`Infrastructure Monitoring\: Network Explorer ` - :ref:`find-network-error` * - :ref:`Infrastructure Monitoring\: Network Explorer ` @@ -83,8 +97,6 @@ This is the collection of scenarios available for Splunk Observability Cloud. Us - :ref:`scenario-monitoring` * - :ref:`RUM ` - :ref:`spa-custom-event` - * - :ref:`RUM ` - - :ref:`rum-identify-span-problems` * - :ref:`RUM ` - :ref:`rum-mobile-scenario` * - :ref:`Synthetic Monitoring ` diff --git a/scenarios-tutorials/scenario.rst b/scenarios-tutorials/scenario.rst index 028e64c63..a40d4e8be 100644 --- a/scenarios-tutorials/scenario.rst +++ b/scenarios-tutorials/scenario.rst @@ -13,38 +13,40 @@ Buttercup Games site reliability engineers (SREs) and service owners work togeth This scenario describes how Kai, an SRE, and Deepu, a service owner, perform the following tasks using Splunk Observability Cloud to troubleshoot and identify the root cause of a recent Buttercup Games site incident: -#. :ref:`receive-alerts-xpuc` - -#. :ref:`assess-user-impact-xpuc` using Splunk Real User Monitoring (RUM) - -#. :ref:`investigate-root-cause-xpuc` using Splunk Application Performance Monitoring (APM) - -#. :ref:`check-infra-health-xpuc` using Splunk Infrastructure Monitoring - -#. :ref:`look-for-patterns-xpuc` in Splunk APM - -#. :ref:`review-logs-xpuc` using Splunk Log Observer - -#. :ref:`monitor-a-fix-xpuc` using Splunk Log Observer - -#. :ref:`take-preventative-action-xpuc` +.. raw:: html + + +
    +
  1. Receive alerts about outlier behavior
  2. +
  3. Assess user impact using Splunk Real User Monitoring (RUM)
  4. +
  5. Investigate the root cause of a business workflow error using Splunk Application Performance Monitoring (APM)
  6. +
  7. Check on infrastructure health in Splunk Infrasctructure Monitoring
  8. +
  9. Look for patterns in application errors in Splunk APM
  10. +
  11. Examine error logs for meaningful messages and patterns using Splunk Log Observer Connect
  12. +
  13. Monitor a fix using Splunk Log Observer Connect
  14. +
  15. Take preventative action and create metrics from logs to power dashboards and alerts
  16. + + +
+ For a video version of this scenario, watch :new-page:`the Splunk Observability Cloud Demo `. -.. _receive-alerts-xpuc: - -Receive alerts about outlier behavior -======================================== +.. raw:: html + + +

Receive alerts about outlier behavior

+ #. Kai, the SRE on call, receives an alert showing that the number of purchases on the Buttercup Games site has dropped significantly in the past hour and the checkout completion rate is too low. Kai trusts that these are true outlier behaviors because the alert rule their team set up in Splunk Observability Cloud takes into account the time and day of week as a dynamic baseline, rather than using a static threshold. #. Kai logs in to Splunk Observability Cloud on their laptop to investigate. - -.. _assess-user-impact-xpuc: - -Assess user impact -=========================== +.. raw:: html + + +

Assess user impact

+ The first thing Kai wants to know about the alert they received is: What's the user impact? @@ -80,11 +82,11 @@ The first thing Kai wants to know about the alert they received is: What's the u Kai decides to take a look at the end-to-end transaction workflow. - -.. _investigate-root-cause-xpuc: - -Investigate the root cause of a business workflow error -=============================================================== +.. raw:: html + + +

Investigate the root cause of a business workflow error

+ #. In Splunk RUM, Kai selects the :strong:`frontend:/cart/checkout` business workflow link to display its service map in Splunk Application Performance Monitoring (APM). A business workflow is a group of logically related traces, such as a group of traces that reflect an end-to-end transaction in your system. @@ -106,12 +108,11 @@ Investigate the root cause of a business workflow error Kai decides to take a look at the Kubernetes cluster to see if the errors are based on an infrastructure issue. - - -.. _check-infra-health-xpuc: - -Check on infrastructure health -=============================================================== +.. raw:: html + + +

Check on infrastructure health

+ #. Kai selects the :strong:`K8s cluster(s) for paymentservice` Related Content tile in Splunk APM to display the Kubernetes navigator in Splunk Infrastructure Monitoring, where their view is automatically narrowed down to the :strong:`paymentservice` to preserve the context they were just looking at in Splunk APM. @@ -129,12 +130,11 @@ Check on infrastructure health 3. Now that Kai can rule out the Kubernetes infrastructure as the source of the issue, they decide to return to their investigation in Splunk APM. Kai selects the :strong:`paymentservice in map` Related Content tile in their current view of Splunk Infrastructure Monitoring. - - -.. _look-for-patterns-xpuc: - -Look for patterns in application errors -=============================================================== +.. raw:: html + + +

Look for patterns in application errors

+ 1. In Splunk APM, Kai selects :strong:`Tag Spotlight`` to look for correlations in tag values for the errors they're seeing. @@ -142,7 +142,7 @@ Look for patterns in application errors .. image:: /_images/get-started/tenant-level.png :width: 60% - :alt: This screenshot shows the tenant.level module in Splunk APM displaying errors evenly spread across gold, silver, and bronze tenant levels. + :alt: This screenshot shows the tenant.level module in Splunk APM displaying errors evenly distributed across gold, silver, and bronze tenant levels. However, when Kai looks at the :strong:`version module`, they see an interesting pattern: errors are happening on version :strong:`v350.10` only and not on the lower :strong:`v350.9` version. @@ -153,20 +153,19 @@ Look for patterns in application errors 2. This seems like a strong lead, so Kai decides to dig into the log details. They select the :strong:`Logs for paymentservice` Related Content tile. +.. raw:: html + + +

Examine error logs for meaningful messages and patterns

+ +Now, in Splunk Log Observer Connect, Kai's view is automatically narrowed to display log data coming in for the :strong:`paymentservice` only. -.. _review-logs-xpuc: - -Examine error logs for meaningful messages and patterns -=============================================================== - -Now, in Splunk Log Observer, Kai's view is automatically narrowed to display log data coming in for the :strong:`paymentservice` only. - -1. Kai sees some error logs, so they select one to see more details in a structured view. As Kai looks at the log details, they see this error message: "Failed payment processing through ButtercupPayments: Invalid API Token (test-20e26e90-356b-432e-a2c6-956fc03f5609)". +1. Kai sees some error logs, so they select 1 to see more details in a structured view. As Kai looks at the log details, they see this error message: "Failed payment processing through ButtercupPayments: Invalid API Token (test-20e26e90-356b-432e-a2c6-956fc03f5609)". .. image:: /_images/get-started/error-log.png :width: 100% - :alt: This screenshot shows the details of an error log in Splunk Log Observer, including the error severity and an error message. + :alt: This screenshot shows the details of an error log in Splunk Log Observer Connect, including the error severity and an error message. 2. In the error message, Kai sees what they think is a clear indication of the error. The API token starts with "test". It seems that a team pushed v350.10 live with a test token that doesn't work in production. @@ -178,56 +177,60 @@ Now, in Splunk Log Observer, Kai's view is automatically narrowed to display log .. image:: /_images/get-started/group-by-version.png :width: 100% - :alt: This screenshot shows the Log Observer page with events filtered down by the error message and grouped by a version of version 350.10. All of the logs that display are error logs. + :alt: This screenshot shows the Log Observer Connect page with events filtered down by the error message and grouped by a version of version 350.10. All of the logs that display are error logs. -4. To be sure, Kai selects the eye icon for the message filter value to temporarily exclude the filter. Now there are logs that show up for version v350.9 too, but they don't include the error message. +4. To be sure, Kai selects the eye icon for the message filter value to temporarily exclude the filter. Now there are logs that appear for version v350.9 too, but they don't include the error message. This exploration convinces Kai that the test API token in v350.10 is the most likely source of the issue. Kai notifies Deepu, the :strong:`paymentservice` owner about their findings. +.. raw:: html + + +

Monitor a fix

+ - -.. _monitor-a-fix-xpuc: - -Monitor a fix -===================================================================================================================== - -Based on Kai's findings, Deepu, the :strong:`paymentservice` owner, looks at the error logs in Splunk Log Observer. They agree with Kai's assessment that the test API token is the likely cause of the problem. +Based on Kai's findings, Deepu, the :strong:`paymentservice` owner, looks at the error logs in Splunk Log Observer Connect. They agree with Kai's assessment that the test API token is the likely cause of the problem. 1. Deepu decides to implement a temporary fix by reverting back to version v350.9 to try to bring the Buttercup Games site back into a known good state, while the team works on a fix to v350.10. -2. As one way to see if reverting to version v350.9 fixes the issue, Deepu opens the time picker in the upper left corner of Splunk Log Observer and selects :strong:`Live Tail`. Live Tail provides Deepu with a real-time streaming view of a sample of incoming logs. +2. As one way to see if reverting to version v350.9 fixes the issue, Deepu opens the time picker in the upper left corner of Splunk Log Observer Connect and selects :strong:`Live Tail`. Live Tail provides Deepu with a real-time streaming view of a sample of incoming logs. .. image:: /_images/get-started/live-tail-verify.gif :width: 100% - :alt: This animated GIF shows Deepu opening the time picker of Splunk Log Observer and selecting Live Tail. Once Deepu selects Live Tail, the error logs with the failed payment messages are cleared and no new logs with the with error message are received. + :alt: This animated GIF shows Deepu opening the time picker of Splunk Log Observer Connect and selecting Live Tail. Once Deepu selects Live Tail, the error logs with the failed payment messages are cleared and no new logs with the with error message are received. 3. Deepu watches the Live Tail view and sure enough, the failed payment messages have stopped appearing in :strong:`paymentservice` logs. Reassured that the Buttercup Games site is back in a stable state, Deepu moves on to helping their team fix v350.10. +.. raw:: html + + +

Take preventative action and create metrics from logs to power dashboards and alerts

+ - -.. _take-preventative-action-xpuc: - -Take preventative action and create metrics from logs to power dashboards and alerts -============================================================================================================== - -Now that Kai knows that this particular issue can cause a problem on the Buttercup Games site, they decide to do some preventative work for their SRE team. Kai takes the query they created in Splunk Log Observer and saves it as a metric. +Now that Kai knows that this particular issue can cause a problem on the Buttercup Games site, they decide to do some preventative work for their SRE team. Kai takes the query they created in Splunk Log Observer Connect and saves it as a metric. .. image:: /_images/get-started/save-as-metric.png :width: 50% - :alt: This screenshot shows the Save as Metric option in the More menu in Log Observer. + :alt: This screenshot shows the Save as Metric option in the More menu in Log Observer Connect. Doing this defines log metricization rules that create a log-derived metric that shows aggregate counts. Kai's team can embed this log-derived metric in charts, dashboards, and alerts that can help them identify this issue faster if it comes up again in the future. +.. raw:: html + + +

Summary

+ -Summary -========== - -Kai was able to respond to and resolve front-end issues with the Buttercup Games website that were preventing users from completing their purchases. Kai used RUM to begin troubleshooting the errors, isolating spikes in front-end errors and back-end latency as possible causes. Digging into the :strong:`/cart/checkout` endpoint, Kai used the Tag Spotlight view in RUM to investigate the full trace. Based on this, Kai realized the latency wasn't a front-end issue. Next, Kai viewed a performance summary and the end-to-end transaction workflow in APM. Looking at the service map, Kai noted that Splunk APM identified the :strong:`paymentservice` as the root cause of the errors. After ruling out Kubernetes issues, Kai used Tag Spotlight to look for correlations in tag values for the errors. Kai noticed that the errors were only happening on a specific version and decided to look into the log details. Using Log Observer, Kai looked at the log details and noticed that the error messages for the API token started with "test". +Kai was able to respond to and resolve front-end issues with the Buttercup Games website that were preventing users from completing their purchases. Kai used RUM to begin troubleshooting the errors, isolating spikes in front-end errors and back-end latency as possible causes. Digging into the :strong:`/cart/checkout` endpoint, Kai used the Tag Spotlight view in RUM to investigate the full trace. Based on this, Kai realized the latency wasn't a front-end issue. Next, Kai viewed a performance summary and the end-to-end transaction workflow in APM. Looking at the service map, Kai noted that Splunk APM identified the :strong:`paymentservice` as the root cause of the errors. After ruling out Kubernetes issues, Kai used Tag Spotlight to look for correlations in tag values for the errors. Kai noticed that the errors were only happening on a specific version and decided to look into the log details. Using Log Observer Connect, Kai looked at the log details and noticed that the error messages for the API token started with "test". -Consulting with Deepu, the :strong:`paymentservice` owner, they agreed that the test API token was the likely cause of the problem. After implementing a fix, Deepu used Log Observer Long Tail reports to monitor a real-time streaming view of the incoming logs. Deepu confirmed that the payment errors were no longer occurring. As a final step, Kai saved the Splunk Log Observer query as a metric in order to alert the team and help resolve similar issues faster in the future. +Consulting with Deepu, the :strong:`paymentservice` owner, they agreed that the test API token was the likely cause of the problem. After implementing a fix, Deepu used Log Observer Connect Long Tail reports to monitor a real-time streaming view of the incoming logs. Deepu confirmed that the payment errors were no longer occurring. As a final step, Kai saved the Splunk Log Observer +Connect query as a metric in order to alert the team and help resolve similar issues faster in the future. -Learn more -#################### +.. raw:: html + + +

Learn more

+ * For details about creating detectors to issue alerts based on charts or metrics, see :ref:`create-detectors`.