Skip to content

Commit 6a37fc8

Browse files
committed
mgr/prometheus: fix metrics service not coming up
ceph#61468 unintentionally broke the http metric service while it removed the code that starts the metrics. adding them back up. adding a basic test to catch these issues Regression from ceph@64f590c#diff-031e09c4297d84a407cf55f8981d38764efc3c37e9827e12e638521f69284e1f Fixes: https://tracker.ceph.com/issues/72012 Signed-off-by: Nizamudeen A <[email protected]>
1 parent 54197bf commit 6a37fc8

File tree

2 files changed

+7
-0
lines changed

2 files changed

+7
-0
lines changed

qa/suites/orch/cephadm/workunits/task/test_monitoring_stack_basic.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,7 @@ tasks:
6464
curl -s http://${ALERTM_IP}:9093/api/v2/status
6565
curl -s http://${ALERTM_IP}:9093/api/v2/alerts
6666
curl -s http://${ALERTM_IP}:9093/api/v2/alerts | jq -e '.[] | select(.labels | .alertname == "CephMonDown") | .status | .state == "active"'
67+
# check prometheus metrics endpoint is not empty and make sure we can get metrics
68+
METRICS_URL=$(ceph mgr services | jq -r .prometheus)
69+
[ -n "$METRICS_URL" ] || exit 1
70+
curl -s "${METRICS_URL}metrics" | grep -q '^ceph_health_status'

src/pybind/mgr/prometheus/module.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1773,6 +1773,9 @@ def configure(self, server_addr: str, server_port: int) -> None:
17731773
self.log.exception(f'Failed to setup cephadm based secure monitoring stack: {e}\n',
17741774
'Falling back to default configuration')
17751775

1776+
# In any error fallback to plain http mode
1777+
self.setup_default_config(server_addr, server_port)
1778+
17761779
def setup_default_config(self, server_addr: str, server_port: int) -> None:
17771780
cherrypy.config.update({
17781781
'server.socket_host': server_addr,

0 commit comments

Comments
 (0)