Skip to content

Commit 9bebe32

Browse files
Merge pull request ceph#65536 from bluikko/doc-services-mon-improvements-cephadm
doc/cephadm: Fix errors and improvements in services/monitoring.rst
2 parents 621f30c + 1471674 commit 9bebe32

File tree

1 file changed

+66
-61
lines changed

1 file changed

+66
-61
lines changed

doc/cephadm/services/monitoring.rst

Lines changed: 66 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ metrics on cluster utilization and performance. Ceph users have three options:
1111
when bootstrapping a new cluster unless the ``--skip-monitoring-stack``
1212
option is used.
1313
#. Deploy and configure these services manually. This is recommended for users
14-
with existing prometheus services in their environment (and in cases where
14+
with existing Prometheus services in their environment (and in cases where
1515
Ceph is running in Kubernetes with Rook).
1616
#. Skip the monitoring stack completely. Some Ceph dashboard graphs will
1717
not be available.
@@ -35,10 +35,10 @@ Manager <https://prometheus.io/docs/alerting/alertmanager/>`_ and `Grafana
3535
impact of denial of service attacks.
3636

3737
Please see `Prometheus' Security model
38-
<https://prometheus.io/docs/operating/security/>` for more detailed
38+
<https://prometheus.io/docs/operating/security/>`_ for more detailed
3939
information.
4040

41-
Deploying monitoring with cephadm
41+
Deploying Monitoring with Cephadm
4242
---------------------------------
4343

4444
The default behavior of ``cephadm`` is to deploy a basic monitoring stack. It
@@ -58,7 +58,7 @@ steps below:
5858

5959
ceph orch apply node-exporter
6060

61-
#. Deploy alertmanager:
61+
#. Deploy Alertmanager:
6262

6363
.. prompt:: bash #
6464

@@ -77,22 +77,22 @@ steps below:
7777

7878
ceph orch apply prometheus --placement 'count:2'
7979

80-
#. Deploy grafana:
80+
#. Deploy Grafana:
8181

8282
.. prompt:: bash #
8383

8484
ceph orch apply grafana
8585

86-
Enabling security for the monitoring stack
87-
----------------------------------------------
86+
Enabling Security for the Monitoring Stack
87+
------------------------------------------
8888

8989
By default, in a cephadm-managed cluster, the monitoring components are set up and configured without enabling security measures.
9090
While this suffices for certain deployments, others with strict security needs may find it necessary to protect the
9191
monitoring stack against unauthorized access. In such cases, cephadm relies on a specific configuration parameter,
92-
`mgr/cephadm/secure_monitoring_stack`, which toggles the security settings for all monitoring components. To activate security
92+
``mgr/cephadm/secure_monitoring_stack``, which toggles the security settings for all monitoring components. To activate security
9393
measures, set this option to ``true`` with a command of the following form:
9494

95-
.. prompt:: bash #
95+
.. prompt:: bash #
9696

9797
ceph config set mgr mgr/cephadm/secure_monitoring_stack true
9898

@@ -111,7 +111,7 @@ value with the commands ``ceph orch prometheus set-credentials`` and ``ceph
111111
orch alertmanager set-credentials`` respectively. These commands offer the
112112
flexibility to input the username/password either as parameters or via a JSON
113113
file, which enhances security. Additionally, Cephadm provides the commands
114-
`orch prometheus get-credentials` and `orch alertmanager get-credentials` to
114+
``orch prometheus get-credentials`` and ``orch alertmanager get-credentials`` to
115115
retrieve the current credentials.
116116

117117
.. _cephadm-monitoring-centralized-logs:
@@ -128,7 +128,7 @@ Some of the advantages are:
128128
#. **Flexible retention policies**: with per-daemon logs, log rotation is usually set to a short interval (1-2 weeks) to save disk usage.
129129
#. **Increased security & backup**: logs can contain sensitive information and expose usage patterns. Additionally, centralized logging allows for HA, etc.
130130

131-
Centralized Logging in Ceph is implemented using two services: ``loki`` and ``alloy``.
131+
Centralized logging in Ceph is implemented using two services: ``loki`` and ``alloy``.
132132

133133
* Loki is a log aggregation system and is used to query logs. It can be configured as a ``datasource`` in Grafana.
134134
* Alloy acts as an agent that gathers logs from each node and forwards them to Loki.
@@ -140,7 +140,7 @@ These two services are not deployed by default in a Ceph cluster. To enable cent
140140
Networks and Ports
141141
~~~~~~~~~~~~~~~~~~
142142

143-
All monitoring services can have the network and port they bind to configured with a yaml service specification. By default
143+
All monitoring services can have the network and port they bind to configured with a YAML service specification. By default
144144
cephadm will use ``https`` protocol when configuring Grafana daemons unless the user explicitly sets the protocol to ``http``.
145145

146146
example spec file:
@@ -161,12 +161,12 @@ example spec file:
161161

162162
.. _cephadm_default_images:
163163

164-
Default images
164+
Default Images
165165
~~~~~~~~~~~~~~
166166

167167
*The information in this section was developed by Eugen Block in a thread on
168168
the [ceph-users] mailing list in April of 2024. The thread can be viewed here:
169-
``https://lists.ceph.io/hyperkitty/list/[email protected]/thread/QGC66QIFBKRTPZAQMQEYFXOGZJ7RLWBN/``.*
169+
https://lists.ceph.io/hyperkitty/list/[email protected]/thread/QGC66QIFBKRTPZAQMQEYFXOGZJ7RLWBN/*
170170

171171
``cephadm`` stores a local copy of the ``cephadm`` binary in
172172
``var/lib/ceph/{FSID}/cephadm.{DIGEST}``, where ``{DIGEST}`` is an alphanumeric
@@ -189,7 +189,7 @@ Default monitoring images are specified in
189189
:exclude-members: desc, image_ref, key
190190

191191

192-
Using custom images
192+
Using Custom Images
193193
~~~~~~~~~~~~~~~~~~~
194194

195195
It is possible to install or upgrade monitoring components based on other
@@ -262,7 +262,7 @@ See also :ref:`cephadm-airgap`.
262262

263263
.. _cephadm-overwrite-jinja2-templates:
264264

265-
Using custom configuration files
265+
Using Custom Configuration Files
266266
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
267267

268268
By overriding cephadm templates, it is possible to completely customize the
@@ -271,18 +271,18 @@ configuration files for monitoring services.
271271
Internally, cephadm already uses `Jinja2
272272
<https://jinja.palletsprojects.com/en/2.11.x/>`_ templates to generate the
273273
configuration files for all monitoring components. Starting from version 17.2.3,
274-
cephadm supports Prometheus http service discovery, and uses this endpoint for the
274+
cephadm supports Prometheus HTTP service discovery, and uses this endpoint for the
275275
definition and management of the embedded Prometheus service. The endpoint listens on
276276
``https://<mgr-ip>:8765/sd/`` (the port is
277277
configurable through the variable ``service_discovery_port``) and returns scrape target
278278
information in `http_sd_config format
279-
<https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config>`_
279+
<https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config>`_.
280280

281281
Customers with external monitoring stack can use `ceph-mgr` service discovery endpoint
282282
to get scraping configuration. Root certificate of the server can be obtained by the
283283
following command:
284284

285-
.. prompt:: bash #
285+
.. prompt:: bash #
286286

287287
ceph orch sd dump cert
288288

@@ -297,7 +297,7 @@ and automatically applied on future deployments of these services.
297297
configuration of cephadm changes. If the updated configuration is to be used,
298298
the custom template needs to be migrated *manually* after each upgrade of Ceph.
299299

300-
Option names
300+
Option Names
301301
""""""""""""
302302

303303
The following templates for files that will be generated by cephadm can be
@@ -349,13 +349,13 @@ Usage
349349

350350
The following command applies a single line value:
351351

352-
.. code-block:: bash
352+
.. prompt:: bash #
353353

354354
ceph config-key set mgr/cephadm/<option_name> <value>
355355

356356
To set contents of files as template use the ``-i`` argument:
357357

358-
.. code-block:: bash
358+
.. prompt:: bash #
359359

360360
ceph config-key set mgr/cephadm/<option_name> -i $PWD/<filename>
361361

@@ -366,7 +366,7 @@ To set contents of files as template use the ``-i`` argument:
366366

367367

368368
Then the configuration file for the service needs to be recreated.
369-
This is done using `reconfig`. For more details see the following example.
369+
This is done using ``reconfig``. For more details see the following example.
370370

371371
Example
372372
"""""""
@@ -377,7 +377,7 @@ Example
377377
ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml \
378378
-i $PWD/prometheus.yml.j2
379379
380-
# reconfig the prometheus service
380+
# reconfig the Prometheus service
381381
ceph orch reconfig prometheus
382382
383383
.. code-block:: bash
@@ -389,74 +389,74 @@ Example
389389
# Note that custom alerting rules are not parsed by Jinja and hence escaping
390390
# will not be an issue.
391391
392-
Deploying monitoring without cephadm
392+
Deploying Monitoring without Cephadm
393393
------------------------------------
394394

395-
If you have an existing prometheus monitoring infrastructure, or would like
395+
If you have an existing Prometheus monitoring infrastructure, or would like
396396
to manage it yourself, you need to configure it to integrate with your Ceph
397397
cluster.
398398

399-
* Enable the prometheus module in the ceph-mgr daemon
399+
* Enable the ``prometheus`` module in the ceph-mgr daemon
400400

401-
.. code-block:: bash
401+
.. prompt:: bash #
402402

403403
ceph mgr module enable prometheus
404404

405-
By default, ceph-mgr presents prometheus metrics on port 9283 on each host
406-
running a ceph-mgr daemon. Configure prometheus to scrape these.
405+
By default, ceph-mgr presents Prometheus metrics on port 9283 on each host
406+
running a ceph-mgr daemon. Configure Prometheus to scrape these.
407407

408408
To make this integration easier, cephadm provides a service discovery endpoint at
409409
``https://<mgr-ip>:8765/sd/``. This endpoint can be used by an external
410410
Prometheus server to retrieve target information for a specific service. Information returned
411411
by this endpoint uses the format specified by the Prometheus `http_sd_config option
412-
<https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config/>`_
412+
<https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config/>`_.
413413

414-
Here's an example prometheus job definition that uses the cephadm service discovery endpoint
414+
Here's an example Prometheus job definition that uses the cephadm service discovery endpoint:
415415

416-
.. code-block:: bash
416+
.. code-block:: yaml
417417
418418
- job_name: 'ceph-exporter'
419419
http_sd_configs:
420420
- url: http://<mgr-ip>:8765/sd/prometheus/sd-config?service=ceph-exporter
421421
422422
423-
* To enable the dashboard's prometheus-based alerting, see :ref:`dashboard-alerting`.
423+
* To enable the dashboard's Prometheus-based alerting, see :ref:`dashboard-alerting`.
424424

425425
* To enable dashboard integration with Grafana, see :ref:`dashboard-grafana`.
426426

427-
Disabling monitoring
427+
Disabling Monitoring
428428
--------------------
429429

430430
To disable monitoring and remove the software that supports it, run the following commands:
431431

432-
.. code-block:: console
432+
.. prompt:: bash #
433433

434-
$ ceph orch rm grafana
435-
$ ceph orch rm prometheus --force # this will delete metrics data collected so far
436-
$ ceph orch rm node-exporter
437-
$ ceph orch rm alertmanager
438-
$ ceph mgr module disable prometheus
434+
ceph orch rm grafana
435+
ceph orch rm prometheus --force # this will delete metrics data collected so far
436+
ceph orch rm node-exporter
437+
ceph orch rm alertmanager
438+
ceph mgr module disable prometheus
439439

440440
See also :ref:`orch-rm`.
441441

442-
Setting up RBD-Image monitoring
442+
Setting up RBD-Image Monitoring
443443
-------------------------------
444444

445445
Due to performance reasons, monitoring of RBD images is disabled by default. For more information please see
446446
:ref:`prometheus-rbd-io-statistics`. If disabled, the overview and details dashboards will stay empty in Grafana
447447
and the metrics will not be visible in Prometheus.
448448

449449
Setting up Prometheus
450-
-----------------------
450+
---------------------
451451

452452
Setting Prometheus Retention Size and Time
453453
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
454454

455455
Cephadm can configure Prometheus TSDB retention by specifying ``retention_time``
456456
and ``retention_size`` values in the Prometheus service spec.
457-
The retention time value defaults to 15 days (15d). Users can set a different value/unit where
457+
The retention time value defaults to 15 days (``15d``). Users can set a different value/unit where
458458
supported units are: 'y', 'w', 'd', 'h', 'm' and 's'. The retention size value defaults
459-
to 0 (disabled). Supported units in this case are: 'B', 'KB', 'MB', 'GB', 'TB', 'PB' and 'EB'.
459+
to ``0`` (disabled). Supported units in this case are: 'B', 'KB', 'MB', 'GB', 'TB', 'PB' and 'EB'.
460460

461461
In the following example spec we set the retention time to 1 year and the size to 1GB.
462462

@@ -479,7 +479,7 @@ In the following example spec we set the retention time to 1 year and the size t
479479
Setting up Grafana
480480
------------------
481481

482-
Manually setting the Grafana URL
482+
Manually Setting the Grafana URL
483483
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
484484

485485
Cephadm automatically configures Prometheus, Grafana, and Alertmanager in
@@ -494,16 +494,19 @@ to set the URL that the user's browser will use to access Grafana. This
494494
value will never be altered by cephadm. To set this configuration option,
495495
issue the following command:
496496

497-
.. prompt:: bash $
497+
.. prompt:: bash #
498498

499499
ceph dashboard set-grafana-frontend-api-url <grafana-server-api>
500500

501501
It might take a minute or two for services to be deployed. After the
502502
services have been deployed, you should see something like this when you issue the command ``ceph orch ls``:
503503

504+
.. prompt:: bash #
505+
506+
ceph orch ls
507+
504508
.. code-block:: console
505509
506-
$ ceph orch ls
507510
NAME RUNNING REFRESHED IMAGE NAME IMAGE ID SPEC
508511
alertmanager 1/1 6s ago docker.io/prom/alertmanager:latest 0881eb8f169f present
509512
crash 2/2 6s ago docker.io/ceph/daemon-base:latest-master-devel mix present
@@ -514,6 +517,8 @@ services have been deployed, you should see something like this when you issue t
514517
Configuring SSL/TLS for Grafana
515518
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
516519

520+
.. versionadded:: Tentacle
521+
517522
``cephadm`` deploys Grafana using a certificate managed by the cephadm
518523
Certificate Manager (certmgr). Certificates for Grafana are **per host**:
519524

@@ -552,7 +557,7 @@ The ``reconfig`` command also ensures that the Ceph Dashboard URL
552557
is updated to use the correct certificate. The ``reconfig`` command
553558
also sets the proper URL for the Ceph Dashboard.
554559

555-
Setting the initial admin password
560+
Setting the Initial admin Password
556561
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
557562

558563
By default, Grafana will not create an initial
@@ -575,13 +580,13 @@ Then apply this specification:
575580
Grafana will now create an admin user called ``admin`` with the
576581
given password.
577582

578-
Turning off anonymous access
583+
Turning off Anonymous Access
579584
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
580585

581586
By default, cephadm allows anonymous users (users who have not provided any
582-
login information) limited, viewer only access to the grafana dashboard. In
583-
order to set up grafana to only allow viewing from logged in users, you can
584-
set ``anonymous_access: False`` in your grafana spec.
587+
login information) limited, viewer only access to the Grafana dashboard. In
588+
order to set up Grafana to only allow viewing from logged in users, you can
589+
set ``anonymous_access: False`` in your Grafana spec.
585590

586591
.. code-block:: yaml
587592
@@ -593,19 +598,19 @@ set ``anonymous_access: False`` in your grafana spec.
593598
anonymous_access: False
594599
initial_admin_password: "mypassword"
595600
596-
Since deploying grafana with anonymous access set to false without an initial
601+
Since deploying Grafana with anonymous access set to false without an initial
597602
admin password set would make the dashboard inaccessible, cephadm requires
598603
setting the ``initial_admin_password`` when ``anonymous_access`` is set to false.
599604

600605

601606
Setting up Alertmanager
602607
-----------------------
603608

604-
Adding Alertmanager webhooks
609+
Adding Alertmanager Webhooks
605610
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
606611

607612
To add new webhooks to the Alertmanager configuration, add additional
608-
webhook urls like so:
613+
webhook URLs like so:
609614

610615
.. code-block:: yaml
611616
@@ -628,18 +633,18 @@ Run ``reconfig`` on the service to update its configuration:
628633
Turn on Certificate Validation
629634
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
630635

631-
If you are using certificates for alertmanager and want to make sure
632-
these certs are verified, you should set the "secure" option to
633-
true in your alertmanager spec (this defaults to false).
636+
If you are using certificates for Alertmanager and want to make sure
637+
these certificates are verified, you should set the ``secure`` option to
638+
true in your Alertmanager spec (this defaults to false).
634639

635640
.. code-block:: yaml
636641
637642
service_type: alertmanager
638643
spec:
639644
secure: true
640645
641-
If you already had alertmanager daemons running before applying the spec
642-
you must reconfigure them to update their configuration
646+
If you already had Alertmanager daemons running before applying the spec
647+
you must reconfigure them to update their configuration:
643648

644649
.. prompt:: bash #
645650

0 commit comments

Comments
 (0)