Skip to content

Commit fe3daa2

Browse files
Merge pull request #2690 from splunk/johannahp-O11YDOCS-6608-histogram-changes
Histogram docs changes
2 parents 289ca17 + 8985091 commit fe3daa2

File tree

1 file changed

+149
-139
lines changed

1 file changed

+149
-139
lines changed

apm/span-tags/metricsets.rst

Lines changed: 149 additions & 139 deletions
Original file line numberDiff line numberDiff line change
@@ -9,172 +9,151 @@ Learn about MetricSets in APM
99

1010
MetricSets are key performance indicators, like request rate, error rate, and request duration, that are calculated from traces and spans in Splunk APM. There are 2 categories of MetricSets: Troubleshooting MetricSets (TMS), used for high-cardinality troubleshooting, and Monitoring MetricSets (MMS), used for real-time monitoring. MetricSets are similar to the metric time series (MTS) used in Splunk Infrastructure Monitoring to populate charts and generate alerts. See :ref:`metric-time-series` to learn more. MetricSets are MTS that are specific to Splunk APM.
1111

12-
.. _troubleshooting-metricsets:
13-
14-
Troubleshooting MetricSets
15-
==========================
16-
17-
Troubleshooting MetricSets (TMS) are metric time series (MTS) you can use for troubleshooting high-cardinality identities in APM. You can also use TMS to make historical comparisons across spans and workflows.
18-
19-
Splunk APM indexes and creates Troubleshooting MetricSets for several span tags by default. For more details about each of these tags, see :ref:`apm-default-span-tags`. You can't modify or stop APM from indexing these span tags.
20-
21-
You can also create custom TMS by indexing additional span tags and processes. To learn how to index span tags and processes to create new Troubleshooting MetricSets, see :ref:`apm-index-span-tags`.
22-
23-
Available TMS metrics
24-
-----------------------
25-
Every TMS creates the following metrics, known as request, error, and duration (RED) metrics. RED metrics appear when you select a service in the service map. See :ref:`service-map` to learn more about using RED metrics in the service map.
26-
27-
- Request rate
28-
- Error rate
29-
- Root cause error rate
30-
- p50, p90, and p99 latency
31-
32-
The measurement precision of Troubleshooting MetricSets is 10 seconds. Splunk APM reports quantiles from a distribution of metrics for each 10-second reporting window.
33-
34-
Use TMS within Splunk APM
35-
----------------------------------------
36-
37-
TMS appear on the service map and in Tag Spotlight. Use TMS to filter the service map and create breakdowns across the values of a given indexed span tag or process.
38-
39-
See :ref:`apm-service-map` and :ref:`apm-tag-spotlight`.
40-
41-
TMS retention period
42-
-----------------------------------
43-
44-
Splunk Observability Cloud retains TMS for the same amount of time as raw traces. By default, the retention period is 8 days.
45-
46-
For more details about Troubleshooting MetricSets, see :ref:`apm-index-tag-tips`.
47-
4812
.. _monitoring-metricsets:
4913

5014
Monitoring MetricSets
5115
=====================
5216

53-
Monitoring MetricSets (MMS) are metric time series (MTS) that power the real-time monitoring capabilities in Splunk APM, including charts and dashboards. MMS power the APM landing page and the dashboard view. MMS are also the metrics that detectors monitor to generate alerts.
17+
Monitoring MetricSets (MMS) are metric time series (MTS) that power the monitoring capabilities in Splunk APM, including charts and dashboards. MMS power the APM landing page and the dashboard view. MMS are also the metrics that detectors monitor to generate alerts.
5418

5519
MMS are available for a specific endpoint or for the aggregate of all endpoints in a service.
5620

57-
Endpoint-level MMS reflect the activity of a single endpoint in a service, while service-level MMS aggregate the activity of all of the endpoints in the service. MMS are limited to spans where the ``span.kind`` has a value of ``SERVER`` or ``CONSUMER``.
21+
Endpoint-level MMS reflect the activity of a single endpoint in a service, while service-level MMS aggregate the activity of all of the endpoints in the service. MMS are created for spans where the ``span.kind`` has a value of ``SERVER`` or ``CONSUMER``.
5822

5923
Spans might lack a ``kind`` value, or have a different ``kind`` value, in the following situations:
6024

6125
* The span originates in self-initiating operations or inferred services
6226
* An error in instrumentation occurs.
6327

64-
In addition to the following default MMS, you can create custom MMS. See :ref:`cmms`.
28+
MMS retention period
29+
-----------------------------------
30+
31+
Splunk Observability Cloud stores MMS for 13 months by default.
6532

6633
.. _default-mms:
6734

35+
6836
Available default MMS metrics and dimensions
6937
-----------------------------------------------
7038

71-
MMS are available for the following APM components:
72-
73-
- service.request
74-
- spans
75-
- inferred.services
76-
- traces
77-
- workflows (Workflow metrics are created by default when you create a Business Workflow. Custom MMS are not available for Business Workflows.)
78-
79-
Each MMS includes 6 metrics for each component. For histogram MMS, there is a single metric for each component. Use the histogram functions to access the specific histogram bucket you want to use.
80-
81-
For each metric, there is 1 metric time series (MTS) with responses ``sf_error: true`` or ``sf_error: false``.
82-
83-
.. list-table::
84-
:widths: 33 33 33
85-
:width: 100
86-
:header-rows: 1
87-
88-
* - Description
89-
- Histogram MMS
90-
- MMS (deprecated)
91-
* - Request count
92-
- ``<component>`` with a ``count`` function
93-
- ``<component>.count``
94-
* - Minimum request duration
95-
- ``<component>`` with a ``min`` function
96-
- ``<component>.duration.ns.min``
97-
* - Maximum request duration
98-
- ``<component>`` with a ``max`` function
99-
- ``<component>.duration.ns.max``
100-
* - Median request duration
101-
- ``<component>`` with a ``median`` function
102-
- ``<component>.duration.ns.median``
103-
* - Percentile request duration
104-
- ``<component>`` with a ``percentile`` function and a percentile ``value``
105-
- ``<component>.duration.ns.p90``
106-
* - Percentile request duration
107-
- ``<component>`` with a ``percentile`` function and a percentile ``value``
108-
- ``<component>.duration.ns.p99``
109-
110-
111-
Each MMS has a set of dimensions you can use to monitor and alert on service performance.
112-
113-
Deprecated non-histogram metrics
114-
---------------------------------
115-
Histograms provide more flexibility and accuracy for your application performance data. If you are using any non-histogram metrics, use the equivalent histogram MMS. In the future, only histogram MMS will be used for monitoring in Splunk APM, including in charts and dashboards. For more information about histograms, see :ref:`histograms`.
39+
MMS are available for the APM components listed in the following table. Each MMS also has a set of dimensions you can use to monitor and alert on service performance. In addition to the following default MMS, you can create custom MMS to deep dive on your MMS. See :ref:`cmms`.
11640

11741
.. _service-mms:
118-
119-
Service dimensions
120-
---------------------------------
121-
122-
* ``sf_environment``
123-
* ``deployment.environment`` - This dimension is only available for histogram MMS.
124-
* ``sf_service``
125-
* ``service.name`` - This dimension is only available for histogram MMS.
126-
* ``sf_error``
127-
12842
.. _inferred-service-mms-dimensions:
129-
130-
Inferred service dimensions
131-
------------------------------
132-
133-
* ``sf_service``
134-
* ``service.name`` - This dimension is only available for histogram MMS.
135-
* ``sf_environment``
136-
* ``deployment.environment`` - This dimension is only available for histogram MMS.
137-
* ``sf_error``
138-
* ``sf.kind``
139-
14043
.. _endpoint-mms:
14144

142-
Span dimensions
143-
----------------------------------------------
144-
145-
* ``sf_environment``
146-
* ``deployment.environment`` - This dimension is only available for histogram MMS.
147-
* ``sf_service``
148-
* ``service.name`` - This dimension is only available for histogram MMS.
149-
* ``sf_operation``
150-
* ``sf_kind``
151-
* ``sf_error``
152-
* ``sf_httpMethod``, where relevant
153-
154-
Trace dimensions
155-
---------------------------------
156-
157-
.. note:: Trace dimensions are not supported for custom MMS.
45+
.. list-table::
46+
:widths: 33 33 33
47+
:width: 100
48+
:header-rows: 1
15849

159-
* ``sf_environment``
160-
* ``deployment.environment`` - This dimension is only available for histogram MMS.
161-
* ``sf_service``
162-
* ``service.name`` - This dimension is only available for histogram MMS.
163-
* ``sf_operation``
164-
* ``sf_httpMethod``
165-
* ``sf_error``
50+
* - Metric name
51+
- Dimensions
52+
- Custom dimension available? (Yes/No)
53+
* - ``service.request`` - the requests to endpoints in a service
54+
- * ``sf_environment``
55+
* ``deployment.environment`` - This dimension is only available for histogram MMS.
56+
* ``sf_service``
57+
* ``service.name`` - This dimension is only available for histogram MMS.
58+
* ``sf_error``
59+
- Yes
60+
* - ``inferred.services`` -
61+
- * ``sf_service``
62+
* ``service.name`` - This dimension is only available for histogram MMS.
63+
* ``sf_environment``
64+
* ``deployment.environment`` - This dimension is only available for histogram MMS.
65+
* ``sf_error``
66+
* ``sf.kind``
67+
* ``sf_operation``
68+
* ``sf_httpMethod``
69+
- No
70+
* - ``spans`` - the count of spans (a single operation)
71+
- * ``sf_environment``
72+
* ``deployment.environment`` - This dimension is only available for histogram MMS.
73+
* ``sf_service``
74+
* ``service.name`` - This dimension is only available for histogram MMS.
75+
* ``sf_operation``
76+
* ``sf_kind``
77+
* ``sf_error``
78+
* ``sf_httpMethod``, where relevant
79+
- Yes
80+
* - ``traces`` - the count of traces (collection of spans that represents a transaction)
81+
- * ``sf_environment``
82+
* ``deployment.environment`` - This dimension is only available for histogram MMS.
83+
* ``sf_service``
84+
* ``service.name`` - This dimension is only available for histogram MMS.
85+
* ``sf_operation``
86+
* ``sf_httpMethod``
87+
* ``sf_error``
88+
- No
89+
* - ``workflows`` - created by default when you create a business workflow
90+
- * ``sf_environment``
91+
* ``deployment.environment`` - This dimension is only available for histogram MMS.
92+
* ``sf_workflow``
93+
* ``sf_error``
94+
- No
95+
96+
Monitoring MetricSets in APM are generated as histogram metrics. Histogram metrics represent a distribution of measurements or metrics, with complete percentile data available. Data is distributed into equally sized intervals, allowing you to compute percentiles across multiple services and aggregate datapoints from multiple metric time series. Histogram metrics provide an advantage over other metric types when calculating percentiles, such as the p90 percentile for a single MTS. See more in :ref:`metric-types`. For histogram MMS, there is a single metric for each component.
97+
98+
Previously, MMS were classified as either a counter or gauge metric type. The previous MMS included 6 metrics for each component.
16699

167-
Workflow dimensions
168-
---------------------------------
100+
.. list-table::
101+
:widths: 33 33 33
102+
:width: 100
103+
:header-rows: 1
169104

170-
Workflow metrics and dimensions are created by default when you create a Business Workflow.
105+
* - Description
106+
- Histogram MMS
107+
- MMS (deprecated)
108+
* - Request count
109+
- ``<component>`` with a ``count`` function
110+
- ``<component>.count``
111+
* - Minimum request duration
112+
- ``<component>`` with a ``min`` function
113+
- ``<component>.duration.ns.min``
114+
* - Maximum request duration
115+
- ``<component>`` with a ``max`` function
116+
- ``<component>.duration.ns.max``
117+
* - Median request duration
118+
- ``<component>`` with a ``median`` function
119+
- ``<component>.duration.ns.median``
120+
* - Percentile request duration
121+
- ``<component>`` with a ``percentile`` function and a percentile ``value``
122+
- ``<component>.duration.ns.p90``
123+
* - Percentile request duration
124+
- ``<component>`` with a ``percentile`` function and a percentile ``value``
125+
- ``<component>.duration.ns.p99``
126+
127+
Example metrics in APM
128+
---------------------------------------------
129+
130+
A histogram MTS uses the following syntax using SignalFlow:
131+
132+
.. code-block:: none
133+
134+
histogram(metric=<metric_name>[,filter=<filter_dict>][,resolution=<resolution>)
135+
136+
The following table displays example SignalFlow functions:
171137

172-
.. note:: Workflow dimensions are not supported for custom MMS.
138+
.. list-table::
139+
:widths: 33 33 33
140+
:width: 100
141+
:header-rows: 1
173142

174-
* ``sf_environment``
175-
* ``deployment.environment`` - This dimension is only available for histogram MMS.
176-
* ``sf_workflow``
177-
* ``sf_error``
143+
* - Description
144+
- Histogram MMS
145+
- Previous MMS (deprecated)
146+
* - Aggregate count of all MTS
147+
- ``A = histogram('spans').count().publish(label='A')``
148+
- ``A = data('spans.count').sum().publish(label='A')``
149+
* - P90 percentile for single MTS
150+
- ``filter_ = filter('sf_environment', 'environment1') and filter('sf_service', 'service 1') and filter('sf_operation', 'operation1') and filter('sf_httpMethod', 'POST') and filter('sf_error', 'false') A = data('spans.duration.ns.p90', filter=filter_, rollup='sum').publish(label='A')``
151+
- ``filter_ = filter('sf_environment', 'us1') and filter('sf_service', 'service1') and filter('sf_operation', 'POST /api/autosuggest/tagvalues') and filter('sf_httpMethod', 'POST') and filter('sf_error', 'false') A = data('spans.duration.ns.p90', filter=filter_, rollup='sum').publish(label='A')``
152+
* - Combined p90 for multiple services
153+
- ``A = histogram('service.request', filter=filter('sf_service', 'service 2', 'service 1')).percentile(pct=90).publish(label='A')``
154+
- ``A = data('service.request.duration.ns.p90', filter=filter('sf_service', 'service 2', 'service 1'), rollup='average').mean().publish(label='A')``
155+
156+
.. note:: Because an aggregation is applied on histogram(), to display all of the metric sets separately, each dimension needs to be applied as a groupby.
178157

179158
Use MMS within Splunk APM
180159
----------------------------------------
@@ -196,10 +175,41 @@ Use MMS for alerting and real-time monitoring in Splunk APM. You can create char
196175
* - Monitor services in APM dashboards
197176
- :ref:`Track service performance using dashboards in Splunk APM<apm-dashboards>`
198177

199-
MMS retention period
178+
.. _troubleshooting-metricsets:
179+
180+
Troubleshooting MetricSets
181+
==========================
182+
183+
Troubleshooting MetricSets (TMS) are metric time series (MTS) you can use for troubleshooting high-cardinality identities in APM. You can also use TMS to make historical comparisons across spans and workflows.
184+
185+
Splunk APM indexes and creates Troubleshooting MetricSets for several span tags by default. For more details about each of these tags, see :ref:`apm-default-span-tags`. You can't modify or stop APM from indexing these span tags.
186+
187+
You can also create custom TMS by indexing additional span tags and processes. To learn how to index span tags and processes to create new Troubleshooting MetricSets, see :ref:`apm-index-span-tags`.
188+
189+
Available TMS metrics
190+
-----------------------
191+
Every TMS creates the following metrics, known as request, error, and duration (RED) metrics. RED metrics appear when you select a service in the service map. See :ref:`service-map` to learn more about using RED metrics in the service map.
192+
193+
- Request rate
194+
- Error rate
195+
- Root cause error rate
196+
- p50, p90, and p99 latency
197+
198+
The measurement precision of Troubleshooting MetricSets is 10 seconds. Splunk APM reports quantiles from a distribution of metrics for each 10-second reporting window.
199+
200+
Use TMS within Splunk APM
201+
----------------------------------------
202+
203+
TMS appear on the service map and in Tag Spotlight. Use TMS to filter the service map and create breakdowns across the values of a given indexed span tag or process.
204+
205+
See :ref:`apm-service-map` and :ref:`apm-tag-spotlight`.
206+
207+
TMS retention period
200208
-----------------------------------
201209

202-
Splunk Observability Cloud stores MMS for 13 months by default.
210+
Splunk Observability Cloud retains TMS for the same amount of time as raw traces. By default, the retention period is 8 days.
211+
212+
For more details about Troubleshooting MetricSets, see :ref:`apm-index-tag-tips`.
203213

204214
Comparing Monitoring MetricSets and Troubleshooting MetricSets
205215
=================================================================
@@ -208,4 +218,4 @@ Because endpoint-level and service-level MMS include a subset of the TMS metrics
208218

209219
For example, values for ``checkout`` service metrics displayed in the host dashboard might be different from the metrics displayed in the service map because there are multiple span ``kind`` values associated with this service that the MMS that power the dashboard don't monitor.
210220

211-
To compare MMS and TMS directly, restrict your TMS to endpoint-only data by filtering to a specific endpoint. You can also break down the service map by endpoint.
221+
To compare MMS and TMS directly, restrict your TMS to endpoint-only data by filtering to a specific endpoint. You can also break down the service map by endpoint.

0 commit comments

Comments
 (0)