Skip to content

Commit 2cef44f

Browse files
[Host metrics] Add field calculation to serverless docs (#4187)
* Add field calculation * Add field calculation to serverless * Integrate reviewer's feedback * Update docs/en/serverless/infra-monitoring/host-metrics.mdx Co-authored-by: DeDe Morton <[email protected]> --------- Co-authored-by: DeDe Morton <[email protected]>
1 parent 38b0630 commit 2cef44f

File tree

1 file changed

+209
-39
lines changed

1 file changed

+209
-39
lines changed

docs/en/serverless/infra-monitoring/host-metrics.mdx

Lines changed: 209 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,25 @@ Learn about key host metrics displayed in the Infrastructure UI:
2222

2323
## Hosts metrics
2424

25-
| Metric | Description |
26-
|---|---|
27-
| **Hosts** | Number of hosts returned by your search criteria. |
25+
<DocTable columns={[
26+
{
27+
"title": "Metric",
28+
"width": "30%"
29+
},
30+
{
31+
"title": "Description",
32+
"width": "70%"
33+
}
34+
]}>
35+
<DocRow>
36+
<DocCell>**Hosts** </DocCell>
37+
<DocCell>
38+
Number of hosts returned by your search criteria.
39+
40+
**Field Calculation**: `count(system.cpu.cores)`
41+
</DocCell>
42+
</DocRow>
43+
</DocTable>
2844

2945
<div id="key-metrics-cpu"></div>
3046

@@ -33,11 +49,11 @@ Learn about key host metrics displayed in the Infrastructure UI:
3349
<DocTable columns={[
3450
{
3551
"title": "Metric",
36-
"width": "50%"
52+
"width": "30%"
3753
},
3854
{
3955
"title": "Description",
40-
"width": "50%"
56+
"width": "70%"
4157
}
4258
]}>
4359
<DocRow>
@@ -46,42 +62,74 @@ Learn about key host metrics displayed in the Infrastructure UI:
4662
Percentage of CPU time spent in states other than Idle and IOWait, normalized by the number of CPU cores. This includes both time spent on user space and kernel space.
4763

4864
100% means all CPUs of the host are busy.
65+
66+
**Field Calculation**: `(average(system.cpu.user.pct) + average(system.cpu.system.pct)) / max(system.cpu.cores)`
4967
</DocCell>
5068
</DocRow>
5169
<DocRow>
5270
<DocCell>**CPU Usage - iowait (%)**</DocCell>
53-
<DocCell>The percentage of CPU time spent in wait (on disk).</DocCell>
71+
<DocCell>
72+
The percentage of CPU time spent in wait (on disk).
73+
74+
**Field Calculation**: `average(system.cpu.iowait.pct) / max(system.cpu.cores)`
75+
</DocCell>
5476
</DocRow>
5577
<DocRow>
5678
<DocCell>**CPU Usage - irq (%)** </DocCell>
57-
<DocCell>The percentage of CPU time spent servicing and handling hardware interrupts.</DocCell>
79+
<DocCell>
80+
The percentage of CPU time spent servicing and handling hardware interrupts.
81+
82+
**Field Calculation**: `average(system.cpu.irq.pct) / max(system.cpu.cores)`
83+
</DocCell>
5884
</DocRow>
5985
<DocRow>
6086
<DocCell>**CPU Usage - nice (%)** </DocCell>
61-
<DocCell>The percentage of CPU time spent on low-priority processes.</DocCell>
87+
<DocCell>
88+
The percentage of CPU time spent on low-priority processes.
89+
90+
**Field Calculation**: `average(system.cpu.nice.pct) / max(system.cpu.cores)`
91+
</DocCell>
6292
</DocRow>
6393
<DocRow>
6494
<DocCell>**CPU Usage - softirq (%)**</DocCell>
65-
<DocCell>The percentage of CPU time spent servicing and handling software interrupts.</DocCell>
95+
<DocCell>
96+
The percentage of CPU time spent servicing and handling software interrupts.
97+
98+
**Field Calculation**: `average(system.cpu.softirq.pct) / max(system.cpu.cores)`
99+
</DocCell>
66100
</DocRow>
67101
<DocRow>
68102
<DocCell>**CPU Usage - steal (%)** </DocCell>
69-
<DocCell>The percentage of CPU time spent in involuntary wait by the virtual CPU while the hypervisor was servicing another processor. Available only on Unix.</DocCell>
103+
<DocCell>
104+
The percentage of CPU time spent in involuntary wait by the virtual CPU while the hypervisor was servicing another processor. Available only on Unix.
105+
106+
**Field Calculation**: `average(system.cpu.steal.pct) / max(system.cpu.cores)`
107+
</DocCell>
70108
</DocRow>
71109
<DocRow>
72110
<DocCell>**CPU Usage - system (%)** </DocCell>
73-
<DocCell>The percentage of CPU time spent in kernel space.</DocCell>
111+
<DocCell>
112+
The percentage of CPU time spent in kernel space.
113+
114+
**Field Calculation**: `average(system.cpu.system.pct) / max(system.cpu.cores)`
115+
</DocCell>
74116
</DocRow>
75117
<DocRow>
76118
<DocCell>**CPU Usage - user (%)** </DocCell>
77-
<DocCell>The percentage of CPU time spent in user space. On multi-core systems, you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use, then the system.cpu.user.pct will be 180%.</DocCell>
119+
<DocCell>
120+
The percentage of CPU time spent in user space. On multi-core systems, you can have percentages that are greater than 100%. For example, if 3 cores are at 60% use, then the system.cpu.user.pct will be 180%.
121+
122+
**Field Calculation**: `average(system.cpu.user.pct) / max(system.cpu.cores)`
123+
</DocCell>
78124
</DocRow>
79125
<DocRow>
80126
<DocCell>**Load (1m)** </DocCell>
81127
<DocCell>
82128
1 minute load average.
83129

84130
Load average gives an indication of the number of threads that are runnable (either busy running on CPU, waiting to run, or waiting for a blocking IO operation to complete).
131+
132+
**Field Calculation**: `average(system.load.1)`
85133
</DocCell>
86134
</DocRow>
87135
<DocRow>
@@ -90,6 +138,8 @@ Learn about key host metrics displayed in the Infrastructure UI:
90138
5 minute load average.
91139

92140
Load average gives an indication of the number of threads that are runnable (either busy running on CPU, waiting to run, or waiting for a blocking IO operation to complete).
141+
142+
**Field Calculation**: `average(system.load.5)`
93143
</DocCell>
94144
</DocRow>
95145
<DocRow>
@@ -98,6 +148,8 @@ Learn about key host metrics displayed in the Infrastructure UI:
98148
15 minute load average.
99149

100150
Load average gives an indication of the number of threads that are runnable (either busy running on CPU, waiting to run, or waiting for a blocking IO operation to complete).
151+
152+
**Field Calculation**: `average(system.load.15)`
101153
</DocCell>
102154
</DocRow>
103155
<DocRow>
@@ -110,6 +162,8 @@ Learn about key host metrics displayed in the Infrastructure UI:
110162
100% means the 1 minute load average is equal to the number of CPU cores of the host.
111163

112164
Taking the example of a 32 CPU cores host, if the 1 minute load average is 32, the value reported here is 100%. If the 1 minute load average is 48, the value reported here is 150%.
165+
166+
**Field Calculation**: `average(system.load.1) / max(system.load.cores)`
113167
</DocCell>
114168
</DocRow>
115169
</DocTable>
@@ -121,29 +175,45 @@ Learn about key host metrics displayed in the Infrastructure UI:
121175
<DocTable columns={[
122176
{
123177
"title": "Metric",
124-
"width": "50%"
178+
"width": "30%"
125179
},
126180
{
127181
"title": "Description",
128-
"width": "50%"
182+
"width": "70%"
129183
}
130184
]}>
131185
<DocRow>
132186
<DocCell>**Memory Cache** </DocCell>
133-
<DocCell>Memory (page) cache.</DocCell>
187+
<DocCell>
188+
Memory (page) cache.
189+
190+
**Field Calculation**: `average(system.memory.used.bytes ) - average(system.memory.actual.used.bytes)`
191+
</DocCell>
134192
</DocRow>
135193
<DocRow>
136194
<DocCell>**Memory Free** </DocCell>
137-
<DocCell>Total available memory.</DocCell>
195+
<DocCell>
196+
Total available memory.
197+
198+
**Field Calculation**: `max(system.memory.total) - average(system.memory.actual.used.bytes)`
199+
</DocCell>
138200
</DocRow>
139201
<DocRow>
140202
<DocCell>**Memory Free (excluding cache)**</DocCell>
141-
<DocCell>Total available memory excluding the page cache.</DocCell>
142-
</DocRow>
203+
<DocCell>
204+
Total available memory excluding the page cache.
205+
206+
**Field Calculation**: `system.memory.free`
207+
</DocCell>
208+
</DocRow>
143209
<DocRow>
144210
<DocCell>**Memory Total** </DocCell>
145-
<DocCell>Total memory capacity.</DocCell>
146-
</DocRow>
211+
<DocCell>
212+
Total memory capacity.
213+
214+
**Field Calculation**: `avg(system.memory.total)`
215+
</DocCell>
216+
</DocRow>
147217
<DocRow>
148218
<DocCell>**Memory Usage (%)** </DocCell>
149219
<DocCell>
@@ -152,42 +222,142 @@ Learn about key host metrics displayed in the Infrastructure UI:
152222
This includes resident memory for all processes plus memory used by the kernel structures and code apart from the page cache.
153223

154224
A high level indicates a situation of memory saturation for the host. For example, 100% means the main memory is entirely filled with memory that can't be reclaimed, except by swapping out.
225+
226+
**Field Calculation**: `average(system.memory.actual.used.pct)`
155227
</DocCell>
156-
</DocRow>
228+
</DocRow>
157229
<DocRow>
158230
<DocCell>**Memory Used** </DocCell>
159-
<DocCell>Main memory usage excluding page cache.</DocCell>
231+
<DocCell>
232+
Main memory usage excluding page cache.
233+
234+
**Field Calculation**: `average(system.memory.actual.used.bytes)`
235+
</DocCell>
160236
</DocRow>
161237
</DocTable>
162238

163239
<div id="key-metrics-log"></div>
164240

165241
## Log metrics
166242

167-
| Metric | Description |
168-
|---|---|
169-
| **Log Rate** | Derivative of the cumulative sum of the document count scaled to a 1 second rate. This metric relies on the same indices as the logs. |
243+
<DocTable columns={[
244+
{
245+
"title": "Metric",
246+
"width": "30%"
247+
},
248+
{
249+
"title": "Description",
250+
"width": "70%"
251+
}
252+
]}>
253+
<DocRow>
254+
<DocCell>**Log Rate** </DocCell>
255+
<DocCell>
256+
Derivative of the cumulative sum of the document count scaled to a 1 second rate. This metric relies on the same indices as the logs.
257+
258+
**Field Calculation**: `cumulative_sum(doc_count)`
259+
</DocCell>
260+
</DocRow>
261+
</DocTable>
170262

171263
<div id="key-metrics-network"></div>
172264

173265
## Network metrics
174266

175-
| Metric | Description |
176-
|---|---|
177-
| **Network Inbound (RX)** | Number of bytes that have been received per second on the public interfaces of the hosts. |
178-
| **Network Inbound (TX)** | Number of bytes that have been sent per second on the public interfaces of the hosts. |
267+
<DocTable columns={[
268+
{
269+
"title": "Metric",
270+
"width": "30%"
271+
},
272+
{
273+
"title": "Description",
274+
"width": "70%"
275+
}
276+
]}>
277+
<DocRow>
278+
<DocCell>**Network Inbound (RX)** </DocCell>
279+
<DocCell>
280+
Number of bytes that have been received per second on the public interfaces of the hosts.
281+
282+
**Field Calculation**: `average(host.network.ingress.bytes) * 8 / (max(metricset.period, kql='host.network.ingress.bytes: *') / 1000)`
283+
</DocCell>
284+
</DocRow>
285+
<DocRow>
286+
<DocCell>**Network Inbound (TX)** </DocCell>
287+
<DocCell>
288+
Number of bytes that have been sent per second on the public interfaces of the hosts.
179289

180-
<div id="key-metrics-disk"></div>
290+
**Field Calculation**: `average(host.network.egress.bytes) * 8 / (max(metricset.period, kql='host.network.egress.bytes: *') / 1000)`
291+
</DocCell>
292+
</DocRow>
293+
</DocTable>
181294

182295
## Disk metrics
183296

184-
| Metric | Description |
185-
|---|---|
186-
| **Disk Latency** | Time spent to service disk requests. |
187-
| **Disk Read IOPS** | Average count of read operations from the device per second. |
188-
| **Disk Read Throughput** | Average number of bytes read from the device per second. |
189-
| **Disk Usage - Available (%)** | Percentage of disk space available. |
190-
| **Disk Usage - Max (%)** | Percentage of disk space used. A high percentage indicates that a partition on a disk is running out of space. |
191-
| **Disk Write IOPS** | Average count of write operations from the device per second. |
192-
| **Disk Write Throughput** | Average number of bytes written from the device per second. |
297+
<DocTable columns={[
298+
{
299+
"title": "Metric",
300+
"width": "30%"
301+
},
302+
{
303+
"title": "Description",
304+
"width": "70%"
305+
}
306+
]}>
307+
<DocRow>
308+
<DocCell>**Disk Latency** </DocCell>
309+
<DocCell>
310+
Time spent to service disk requests.
311+
312+
**Field Calculation**: `average(system.diskio.read.time + system.diskio.write.time) / (system.diskio.read.count + system.diskio.write.count)`
313+
</DocCell>
314+
</DocRow>
315+
<DocRow>
316+
<DocCell>**Disk Read IOPS** </DocCell>
317+
<DocCell>
318+
Average count of read operations from the device per second.
319+
320+
**Field Calculation**: `counter_rate(max(system.diskio.read.count), kql='system.diskio.read.count: *')`
321+
</DocCell>
322+
</DocRow>
323+
<DocRow>
324+
<DocCell>**Disk Read Throughput** </DocCell>
325+
<DocCell>
326+
Average number of bytes read from the device per second.
327+
328+
**Field Calculation**: `counter_rate(max(system.diskio.read.bytes), kql='system.diskio.read.bytes: *')`
329+
</DocCell>
330+
</DocRow>
331+
<DocRow>
332+
<DocCell>**Disk Usage - Available (%)** </DocCell>
333+
<DocCell>
334+
Percentage of disk space available.
335+
336+
**Field Calculation**: `1-average(system.filesystem.used.pct)`
337+
</DocCell>
338+
</DocRow>
339+
<DocRow>
340+
<DocCell>**Disk Usage - Max (%)** </DocCell>
341+
<DocCell>
342+
Percentage of disk space used. A high percentage indicates that a partition on a disk is running out of space.
343+
344+
**Field Calculation**: `max(system.filesystem.used.pct)`
345+
</DocCell>
346+
</DocRow>
347+
<DocRow>
348+
<DocCell>**Disk Write IOPS** </DocCell>
349+
<DocCell>
350+
Average count of write operations from the device per second.
193351

352+
**Field Calculation**: `counter_rate(max(system.diskio.write.count), kql='system.diskio.write.count: *')`
353+
</DocCell>
354+
</DocRow>
355+
<DocRow>
356+
<DocCell>**Disk Write Throughput** </DocCell>
357+
<DocCell>
358+
Average number of bytes written from the device per second.
359+
360+
**Field Calculation**: `counter_rate(max(system.diskio.write.bytes), kql='system.diskio.write.bytes: *')`
361+
</DocCell>
362+
</DocRow>
363+
</DocTable>

0 commit comments

Comments
 (0)