Skip to content

Commit 8377c47

Browse files
authored
add request trigger documentation (#2998)
1 parent cbaaea0 commit 8377c47

File tree

1 file changed

+260
-0
lines changed

1 file changed

+260
-0
lines changed
Lines changed: 260 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,260 @@
1+
# Triggers
2+
3+
We have the option to trigger based on resource usage (i.e., CPU and memory), as well as
4+
the duration of span (e.g., a business transaction) data that is often automatically collected by
5+
OpenTelemetry (otel). The resource triggers can be configured
6+
from within the Application Insights UI under the performance->profiler->triggers menu.
7+
Currently, the span triggers must be configured within the json configuration of your
8+
process.
9+
10+
## Configuration
11+
12+
An example of a complete configuration is as follows:
13+
14+
```json
15+
{
16+
"connectionString": ".....",
17+
"preview": {
18+
"profiler": {
19+
"enableRequestTriggering": true,
20+
"requestTriggerEndpoints": [
21+
{
22+
"name": "Users endpoint is responsive",
23+
"type": "LATENCY",
24+
"filter": {
25+
"type": "name-regex",
26+
"value": "/users/get/.*"
27+
},
28+
"aggregation": {
29+
"configuration": {
30+
"thresholdMillis": 7000
31+
},
32+
"type": "breach-ratio",
33+
"windowSizeMillis": 60000
34+
},
35+
"threshold": {
36+
"value": 0.75
37+
},
38+
"profileDuration": 30,
39+
"throttling": {
40+
"value": 60
41+
}
42+
}
43+
]
44+
}
45+
},
46+
"role": {
47+
"instance": "my-app",
48+
"name": "my-service"
49+
}
50+
}
51+
```
52+
53+
This example configuration states that for a rolling window of 60 seconds if more than 75% of the
54+
transaction going through `/users/get/.*` take longer than 7 seconds, then that is a breach of SLA
55+
and a diagnosis will be triggered. Specifically:
56+
57+
- Works on the latency/duration of spans.
58+
- Filters spans that match the regex `/users/get/.*`.
59+
- Calculates the ratio of requests that breach 7 seconds.
60+
- If that ratio goes above 0.75 (i.e 75%) a profile it triggered. This is calculated in a rolling 60
61+
second window).
62+
- On a breach a 30 second profile is gathered.
63+
- After a profile is generated a 60 second cooldown is applied and a profile cannot be re-triggered.
64+
65+
The key aspects for triggering are the `enableRequestTriggering` and `requestTriggerEndpoints`
66+
values.
67+
68+
- `enableRequestTriggering` - Enables the subsystem that monitors otel span data
69+
- `requestTriggerEndpoints` - A list of SLA definitions, if any of these are breached a
70+
profile will be generated.
71+
72+
## SLA configuration
73+
74+
Each individual configuration is formed of:
75+
76+
- `aggregation` - An aggregation function that computes a metric over which the trigger
77+
will be evaluated, such as mean, max, min etc.
78+
- `filter` - Filters the spans of interest for this SLA
79+
- `name` - Name of this SLA trigger, this will be displayed within the UI
80+
- `type` - The type of the metric that will be analysed (at time of writing latency is the only
81+
supported type).
82+
- `profileDuration` - The duration in seconds of the profile to be collected when this SLA is
83+
breached.
84+
- `threshold` - The threshould applied to the output of the aggregation, if this value is breached a
85+
profile will be triggered, i.e the `breach-ratio` aggregation outputs the percentage of requests
86+
that breach the SLA, a threshold of 0.95 would trigger a profile if 95% of requests breach the
87+
SLA.
88+
- `throttling` - Configures a cooldown to prevent excessive triggering.
89+
90+
### `aggregation`
91+
92+
Currently, we support:
93+
94+
- `breach-ratio` - This calculates the ratio of samples that breached the configured value.
95+
- Configuration parameters:
96+
- `thresholdMs` - The threshold (in milliseconds) above which a span will be considered breached.
97+
- `minimumSamples` - The minimum number of samples that must be collected for the aggregation to
98+
produce data, this is to prevent triggering off of small sample sizes
99+
100+
```json
101+
{
102+
"aggregation": {
103+
"configuration": {
104+
"thresholdMillis": 7000
105+
},
106+
"type": "breach-ratio"
107+
}
108+
}
109+
```
110+
111+
### `filter`
112+
113+
- `name-regex` - If the regular expression matches then the span is included
114+
115+
```json
116+
{
117+
"filter": {
118+
"type": "name-regex",
119+
"value": "/users/get/[A-Za-z]+"
120+
}
121+
}
122+
```
123+
124+
### `threshold`
125+
126+
- `type` - One of: `greater-than`, `less-than`.
127+
- `value` - value that will be applied to the output of the aggregation
128+
129+
```json
130+
{
131+
"threshold": {
132+
"type": "greater-than",
133+
"value": 0.75
134+
}
135+
}
136+
```
137+
138+
### `throttling`
139+
140+
- `type` - Currently supports `fixed-duration-cooldown`
141+
- `value` - Time in seconds during which a profile will not be triggered
142+
143+
```json
144+
{
145+
"throttling": {
146+
"type": "fixed-duration-cooldown",
147+
"value": 30
148+
}
149+
}
150+
```
151+
152+
## Examples
153+
154+
### Monitor all requests
155+
156+
- 30 second profile
157+
- Filters paths that match /.*
158+
- Triggers when 75% of requests are greater than 7000 milliseconds
159+
- Prevents re-triggering for 60 seconds
160+
161+
```json
162+
{
163+
"preview": {
164+
"profiler": {
165+
"enableRequestTriggering": true,
166+
"requestTriggerEndpoints": [
167+
{
168+
"name": "All",
169+
"type": "latency",
170+
"profileDuration": 30,
171+
"filter": {
172+
"type": "name-regex",
173+
"value": "/.*"
174+
},
175+
"aggregation": {
176+
"configuration": {
177+
"thresholdMillis": 7000
178+
},
179+
"type": "breach-ratio"
180+
},
181+
"threshold": {
182+
"value": 0.75
183+
},
184+
"throttling": {
185+
"value": 60
186+
}
187+
}
188+
]
189+
}
190+
}
191+
}
192+
```
193+
194+
### Monitor 2 endpoints
195+
196+
- /users/.* endpoints
197+
- Filters paths that match /users/.*
198+
- 30 second profile
199+
- Triggers when 75% of requests are greater than 7000 milliseconds
200+
- Prevents re-triggering for 60 seconds
201+
- /index.html endpoint
202+
- Filters paths that match /index.html
203+
- 60 second profile
204+
- Triggers when 50% of requests are greater than 100 milliseconds
205+
- Prevents re-triggering for 60 seconds
206+
207+
```json
208+
{
209+
"preview": {
210+
"profiler": {
211+
"enabled": true,
212+
"enableRequestTriggering": true,
213+
"requestTriggerEndpoints": [
214+
{
215+
"name": "Users",
216+
"type": "latency",
217+
"profileDuration": 30,
218+
"filter": {
219+
"type": "name-regex",
220+
"value": "/users/.*"
221+
},
222+
"aggregation": {
223+
"configuration": {
224+
"thresholdMillis": 7000
225+
},
226+
"type": "breach-ratio"
227+
},
228+
"threshold": {
229+
"value": 0.75
230+
},
231+
"throttling": {
232+
"value": 60
233+
}
234+
},
235+
{
236+
"name": "Index.html",
237+
"type": "latency",
238+
"profileDuration": 60,
239+
"filter": {
240+
"type": "name-regex",
241+
"value": "/index\\.html"
242+
},
243+
"aggregation": {
244+
"configuration": {
245+
"thresholdMillis": 100
246+
},
247+
"type": "breach-ratio"
248+
},
249+
"threshold": {
250+
"value": 0.5
251+
},
252+
"throttling": {
253+
"value": 60
254+
}
255+
}
256+
]
257+
}
258+
}
259+
}
260+
```

0 commit comments

Comments
 (0)