Skip to content

Commit 29c0eb2

Browse files
committed
Add check_vsphere article
1 parent 8226600 commit 29c0eb2

File tree

1 file changed

+182
-0
lines changed
  • content/en/blog/insights/check_vsphere

1 file changed

+182
-0
lines changed
Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
---
2+
date: 2025-11-27T10:00:00.000Z
3+
title: A Brief Overview Of The check_vsphere Plugin
4+
tags:
5+
- omd
6+
- vsphere
7+
---
8+
9+
## What is it?
10+
11+
[check\_vsphere](https://github.com/consol-monitoring/check_vsphere)
12+
is a plugin for Naemon, Icinga, and Nagios-compatible systems.
13+
It checks various aspects of ~VMware~Broadcom vCenter or ESX hosts.
14+
15+
For a long time, this was done using `check_vmware_esx.pl`
16+
or `check_esx.pl`. However, Broadcom (formerly VMware)
17+
has decided to deprecate the Perl SDK for vCenter.
18+
Therefore, we decided to rewrite the parts our
19+
customers use in Python using the [pyVmomi](https://github.com/vmware/pyvmomi/)
20+
library.
21+
22+
In this article, I will provide an overview of what the plugin
23+
can do and delve into some of its features.
24+
25+
Development happens at
26+
[Github](https://github.com/consol-monitoring/check_vsphere).
27+
Feel free to open issues or pull requests.
28+
29+
## Authentication
30+
31+
Currently, only user/password-based authentication is supported. The
32+
common options needed to establish a connection are:
33+
34+
* `-u USERNAME`
35+
* `-p PASSWORD` can be omitted in favor of the `VSPHERE_PASS`
36+
environment variable
37+
* `-s ADDR` hostname of the vCenter or ESX host
38+
* `-nossl` whether TLS verification should be skipped
39+
40+
So a command line has at least this basic structure:
41+
42+
```
43+
check_vsphere subcommand -u user -p pass -s addr [subcommand options]
44+
```
45+
46+
In this document `[AUTH]` just means: `-u user -p pass -s addr`.
47+
48+
## Checks
49+
50+
Here is a brief overview of some features, to see the full list please see
51+
[the documentation](https://omd.consol.de/docs/plugins/check_vsphere/cmd/).
52+
53+
### VSAN
54+
55+
The [vsan](/docs/plugins/check_vsphere/cmd/vsan/) command
56+
offers two modes:
57+
58+
* `healthtest` – shows exactly what you see under
59+
**Cluster → Monitor → vSAN → Skyline Health** in vCenter.
60+
* `objecthealth` – performs a detailed check of vSAN object health.
61+
62+
Please try them, they are not used very much and may need some fine tuning.
63+
64+
### Host checks
65+
66+
There are several host checks in `check_vsphere`:
67+
68+
* **[host-runtime](/docs/plugins/check_vsphere/cmd/host-runtime/)**
69+
offers a few modes:
70+
* **status** – vCenter calculates an overall host status. This mode
71+
just maps the colors to exit codes (green → OK, yellow → warning,
72+
red → critical).
73+
* **con** – checks whether the host can still talk to the vCenter.
74+
* **health** – runs various health checks exposed by the API for the
75+
host (memory, voltage, fans, …) and reports any problems.
76+
* **temp** – walks through the temperature sensors and reports issues.
77+
The state is determined by the vCenter/ESX host itself.
78+
* **[host-nic](/docs/plugins/check_vsphere/cmd/host-nic/)** -
79+
This check verifies if all network interfaces are connected
80+
* **[host-service](/docs/plugins/check_vsphere/cmd/host-service/)** -
81+
This check can verify if various services are running on a host, like ntp, DCUI, vpxa etc.
82+
83+
### VM checks
84+
85+
* **[media](/docs/plugins/check_vsphere/cmd/media/)**
86+
spots VMs that still have a CD‑ROM attached.
87+
* **[vm-tools](/docs/plugins/check_vsphere/cmd/vmtools/)**
88+
flags VMs without guest tools installed.
89+
* **[vm‑net‑dev](/docs/plugins/check_vsphere/cmd/vmnetdev/)**
90+
finds VMs that contain unused network devices.
91+
* **[snapshots](/docs/plugins/check‑vsphere/cmd/snapshots/)**
92+
reports VMs with an unexpected number of snapshots or snapshots
93+
that are too old.
94+
* **[vm‑guestfs](/docs/plugins/check‑vsphere/cmd/vmguestfs/)**
95+
monitors filesystem usage of VM volumes via vCenter.
96+
97+
### PerfCounters
98+
99+
#### Overview
100+
101+
The vCenter has a variety of [performance
102+
counters](https://dp-downloads.broadcom.com/api-content/apis/API_VWSA_001/8.0U3/html/ReferenceGuides/vim.PerformanceManager.html).
103+
These counters may be related to VirtualMachines, HostSystems, Datacenters,
104+
ClusterComputeResources, and possibly more.
105+
106+
`check_vmware_esx` had many hard-coded options for specific
107+
performance counters. We decided to generalize this so any
108+
performance counter can be checked with `check_vsphere`.
109+
110+
To get a list of performance counters available on a vCenter, the
111+
`list-metrics` command can be used.
112+
113+
```
114+
check_vsphere list-metrics [AUTH]
115+
```
116+
117+
If you're coming from `check_vmware_esx`,
118+
[the documentation](/docs/plugins/check_vsphere/cmd/perf/#rosetta) has a list of
119+
all the performance counters that were supported by `check_vmware_esx` and their
120+
counterparts in `check_vsphere`. However, as mentioned earlier, you can check
121+
any performance counter. For example, to monitor the power consumption of an ESX
122+
host:
123+
124+
```
125+
check_vsphere perf [AUTH] --perfcounter power:power:average \
126+
--vimtype HostSystem --vimname esx-hostname \
127+
--critical 400
128+
```
129+
130+
#### Instances
131+
132+
`check_vmware_esx` and its related tools have a significant bug.
133+
Performance counters can have instances. For example, disk I/O counters
134+
are available for each disk, where each disk represents an instance of
135+
the counter. When you monitor this with `check_vmware_esx`, you only
136+
monitor a random disk and ignore all the others. Yes, we have been
137+
monitoring random disks for years.
138+
139+
With `check_vsphere`, you can now check specific disks using the
140+
`--perfinstance` flag. The default instance is an empty string, which
141+
is a special value. It monitors the aggregate (average) across all
142+
instances where this is applicable. This is only available when it
143+
makes sense; for example, CPU usage can have an aggregate over all
144+
cores. However, calculating the average across several different disks
145+
is generally not meaningful, so vSphere does not provide this aggregate.
146+
147+
You can also check each instance with `--perfinstance '*'`. In this
148+
case, the threshold is applied to each instance, and the highest
149+
criticality is returned.
150+
151+
```
152+
# check disk latency
153+
# the default perfinstance is '' which is the aggregate and not available
154+
# for this counter
155+
$ check_vsphere perf -s vcenter.example.com -u naemon@vsphere.local -nossl \
156+
--vimname esx1.int.example.com --vimtype HostSystem \
157+
--perfcounter disk:totalLatency:average
158+
UNKNOWN: Cannot find disk:totalLatency:average for the queried resources
159+
160+
# On that error you may want to try --perfinstance '*'
161+
# now you see all instances for this counter
162+
163+
$ check_vsphere perf -s vcenter.example.com -u naemon@vsphere.local -nossl \
164+
--vimname esx1.int.example.com --vimtype HostSystem \
165+
--perfcounter disk:totalLatency:average --perfinstance '*'
166+
OK: disk:totalLatency:average_naa.6000eb3810d426400000000000000277 has value 0 Millisecond
167+
disk:totalLatency:average_naa.600605b00ba8cb0022564867b8c8cc32 has value 2 Millisecond
168+
disk:totalLatency:average_naa.6000eb3810d4264000000000000000b2 has value 0 Millisecond
169+
disk:totalLatency:average_naa.600605b00ba8cb001fd947850523e56d has value 0 Millisecond
170+
disk:totalLatency:average_naa.600605b00ba8cb0029700b163217244e has value 6 Millisecond
171+
disk:totalLatency:average_naa.6000eb3810d4264000000000000002b3 has value 1 Millisecond
172+
| 'disk:totalLatency:average_naa.6000eb3810d426400000000000000277'=0.0ms;;;;
173+
'disk:totalLatency:average_naa.600605b00ba8cb0022564867b8c8cc32'=2.0ms;;;;
174+
...
175+
176+
# you can also check a single instance specifically
177+
$ check_vsphere perf -s vcenter.example.com -u naemon@vsphere.local -nossl \
178+
--vimname esx1.int.example.com --vimtype HostSystem \
179+
--perfcounter disk:totalLatency:average --perfinstance naa.600605b00ba8cb0022564867b8c8cc32
180+
OK: disk:totalLatency:average_naa.600605b00ba8cb0022564867b8c8cc32 has value 2 Millisecond
181+
| 'disk:totalLatency:average_naa.600605b00ba8cb0022564867b8c8cc32'=2.0ms;;;;
182+
```

0 commit comments

Comments
 (0)