Skip to content

Commit b7c0a6a

Browse files
committed
doc: add node-proxy documentation
This commit adds some documentation about the 'hardware inventory / monitoring' feature (node-proxy agent). Signed-off-by: Guillaume Abrioux <[email protected]>
1 parent 9a949f1 commit b7c0a6a

File tree

3 files changed

+187
-0
lines changed

3 files changed

+187
-0
lines changed

doc/hardware-monitoring/index.rst

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
.. _hardware-monitoring:
2+
3+
Hardware monitoring
4+
===================
5+
6+
`node-proxy` is the internal name to designate the running agent which inventories a machine's hardware, provides the different statuses and enable the operator to perform some actions.
7+
It gathers details from the RedFish API, processes and pushes data to agent endpoint in the Ceph manager daemon.
8+
9+
.. graphviz::
10+
11+
digraph G {
12+
node [shape=record];
13+
mgr [label="{<mgr> ceph manager}"];
14+
dashboard [label="<dashboard> ceph dashboard"];
15+
agent [label="<agent> agent"];
16+
redfish [label="<redfish> redfish"];
17+
18+
agent -> redfish [label=" 1." color=green];
19+
agent -> mgr [label=" 2." color=orange];
20+
dashboard:dashboard -> mgr [label=" 3."color=lightgreen];
21+
node [shape=plaintext];
22+
legend [label=<<table border="0" cellborder="1" cellspacing="0">
23+
<tr><td bgcolor="lightgrey">Legend</td></tr>
24+
<tr><td align="center">1. Collects data from redfish API</td></tr>
25+
<tr><td align="left">2. Pushes data to ceph mgr</td></tr>
26+
<tr><td align="left">3. Query ceph mgr</td></tr>
27+
</table>>];
28+
}
29+
30+
31+
Limitations
32+
-----------
33+
34+
For the time being, the `node-proxy` agent relies on the RedFish API.
35+
It implies both `node-proxy` agent and `ceph-mgr` daemon need to be able to access the Out-Of-Band network to work.
36+
37+
38+
Deploying the agent
39+
-------------------
40+
41+
| The first step is to provide the out of band management tool credentials.
42+
| This can be done when adding the host with a service spec file:
43+
44+
.. code-block:: bash
45+
46+
# cat host.yml
47+
---
48+
service_type: host
49+
hostname: node-10
50+
addr: 10.10.10.10
51+
oob:
52+
addr: 20.20.20.10
53+
username: admin
54+
password: p@ssword
55+
56+
Apply the spec:
57+
58+
.. code-block:: bash
59+
60+
# ceph orch apply -i host.yml
61+
Added host 'node-10' with addr '10.10.10.10'
62+
63+
Deploy the agent:
64+
65+
.. code-block:: bash
66+
67+
# ceph config set mgr mgr/cephadm/hw_monitoring true
68+
69+
CLI
70+
---
71+
72+
| **orch** **hardware** **status** [hostname] [--category CATEGORY] [--format plain | json]
73+
74+
supported categories are:
75+
76+
* summary (default)
77+
* memory
78+
* storage
79+
* processors
80+
* network
81+
* power
82+
* fans
83+
* firmwares
84+
* criticals
85+
86+
Examples
87+
********
88+
89+
90+
hardware health statuses summary
91+
++++++++++++++++++++++++++++++++
92+
93+
.. code-block:: bash
94+
95+
# ceph orch hardware status
96+
+------------+---------+-----+-----+--------+-------+------+
97+
| HOST | STORAGE | CPU | NET | MEMORY | POWER | FANS |
98+
+------------+---------+-----+-----+--------+-------+------+
99+
| node-10 | ok | ok | ok | ok | ok | ok |
100+
+------------+---------+-----+-----+--------+-------+------+
101+
102+
103+
storage devices report
104+
++++++++++++++++++++++
105+
106+
.. code-block:: bash
107+
108+
# ceph orch hardware status IBM-Ceph-1 --category storage
109+
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
110+
| HOST | NAME | MODEL | SIZE | PROTOCOL | SN | STATUS | STATE |
111+
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
112+
| node-10 | Disk 8 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99QLL | OK | Enabled |
113+
| node-10 | Disk 10 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZYX | OK | Enabled |
114+
| node-10 | Disk 11 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZWB | OK | Enabled |
115+
| node-10 | Disk 9 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZC9 | OK | Enabled |
116+
| node-10 | Disk 3 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT9903Y | OK | Enabled |
117+
| node-10 | Disk 1 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT9901E | OK | Enabled |
118+
| node-10 | Disk 7 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZQJ | OK | Enabled |
119+
| node-10 | Disk 2 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99PA2 | OK | Enabled |
120+
| node-10 | Disk 4 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99PFG | OK | Enabled |
121+
| node-10 | Disk 0 in Backplane 0 of Storage Controller in Slot 2 | MZ7L33T8HBNAAD3 | 3840755981824 | SATA | S6M5NE0T800539 | OK | Enabled |
122+
| node-10 | Disk 1 in Backplane 0 of Storage Controller in Slot 2 | MZ7L33T8HBNAAD3 | 3840755981824 | SATA | S6M5NE0T800554 | OK | Enabled |
123+
| node-10 | Disk 6 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZER | OK | Enabled |
124+
| node-10 | Disk 0 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZEJ | OK | Enabled |
125+
| node-10 | Disk 5 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99QMH | OK | Enabled |
126+
| node-10 | Disk 0 on AHCI Controller in SL 6 | MTFDDAV240TDU | 240057409536 | SATA | 22373BB1E0F8 | OK | Enabled |
127+
| node-10 | Disk 1 on AHCI Controller in SL 6 | MTFDDAV240TDU | 240057409536 | SATA | 22373BB1E0D5 | OK | Enabled |
128+
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
129+
130+
131+
132+
firmwares details
133+
+++++++++++++++++
134+
135+
.. code-block:: bash
136+
137+
# ceph orch hardware status node-10 --category firmwares
138+
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
139+
| HOST | COMPONENT | NAME | DATE | VERSION | STATUS |
140+
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
141+
| node-10 | current-107649-7.03__raid.backplane.firmware.0 | Backplane 0 | 2022-12-05T00:00:00Z | 7.03 | OK |
142+
143+
144+
... omitted output ...
145+
146+
147+
| node-10 | previous-25227-6.10.30.20__idrac.embedded.1-1 | Integrated Remote Access Controller | 00:00:00Z | 6.10.30.20 | OK |
148+
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
149+
150+
151+
hardware critical warnings report
152+
+++++++++++++++++++++++++++++++++
153+
154+
.. code-block:: bash
155+
156+
# ceph orch hardware status --category criticals
157+
+------------+-----------+------------+----------+-----------------+
158+
| HOST | COMPONENT | NAME | STATUS | STATE |
159+
+------------+-----------+------------+----------+-----------------+
160+
| node-10 | power | PS2 Status | critical | unplugged |
161+
+------------+-----------+------------+----------+-----------------+
162+
163+
164+
Developpers
165+
-----------
166+
167+
.. py:currentmodule:: cephadm.agent
168+
.. autoclass:: NodeProxyEndpoint
169+
.. automethod:: NodeProxyEndpoint.__init__
170+
.. automethod:: NodeProxyEndpoint.oob
171+
.. automethod:: NodeProxyEndpoint.data
172+
.. automethod:: NodeProxyEndpoint.fullreport
173+
.. automethod:: NodeProxyEndpoint.summary
174+
.. automethod:: NodeProxyEndpoint.criticals
175+
.. automethod:: NodeProxyEndpoint.memory
176+
.. automethod:: NodeProxyEndpoint.storage
177+
.. automethod:: NodeProxyEndpoint.network
178+
.. automethod:: NodeProxyEndpoint.power
179+
.. automethod:: NodeProxyEndpoint.processors
180+
.. automethod:: NodeProxyEndpoint.fans
181+
.. automethod:: NodeProxyEndpoint.firmwares
182+
.. automethod:: NodeProxyEndpoint.led
183+

doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,5 +121,6 @@ about Ceph, see our `Architecture`_ section.
121121
releases/general
122122
releases/index
123123
security/index
124+
hardware-monitoring/index
124125
Glossary <glossary>
125126
Tracing <jaegertracing/index>

doc/monitoring/index.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -470,5 +470,8 @@ Useful queries
470470
rate(ceph_rbd_read_latency_sum[30s]) / rate(ceph_rbd_read_latency_count[30s]) * on (instance) group_left (ceph_daemon) ceph_rgw_metadata
471471
472472
473+
Hardware monitoring
474+
===================
473475
476+
See :ref:`hardware-monitoring`
474477

0 commit comments

Comments
 (0)