Skip to content

[BE] 프로메테우스 도입#183

Merged
shininghyunho merged 4 commits intodevfrom
backend-182
Feb 4, 2026
Merged

[BE] 프로메테우스 도입#183
shininghyunho merged 4 commits intodevfrom
backend-182

Conversation

@shininghyunho
Copy link
Copy Markdown
Collaborator

@shininghyunho shininghyunho commented Feb 4, 2026

loki 랑 설정 비슷함.
보는건 기존에 보던 grafana에서 확인.

기존 grafana의 database에 prometheus 등록해줌.
image

Summary by CodeRabbit

  • 새로운 기능

    • 시스템 모니터링을 위한 Prometheus 서비스와 수집 대상이 추가되었습니다.
    • 각 서비스(백엔드·큐)에 모니터링 에이전트가 도입되어 메트릭 수집이 가능해졌습니다.
  • 개선 사항

    • 배포 워크플로우가 모니터링 서비스 시작을 포함하도록 업데이트되어 운영 가시성이 향상되었습니다.

- backend/docker-compose.yml: cAdvisor 서비스 추가 (8080 포트)
- queue-backend/docker-compose.yml: cAdvisor 서비스 추가 (8080 포트)
- monitoring/docker-compose.yml: Prometheus 서비스 및 볼륨 설정 추가
- monitoring/prometheus/prometheus.yml: 각 인스턴스 cAdvisor 타겟 등록
- .github/workflows/cd-manual-o.yml: docker compose pull/up 대상에 cadvisor 추가
- .github/workflows/cd-manual.yml: api, ticket, queue 배포 섹션에 cadvisor 추가
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 4, 2026

Warning

Rate limit exceeded

@shininghyunho has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 32 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

모니터링 인프라( cadvisor, Prometheus )를 추가하고, 배포 워크플로우(.github/workflows/*.yml)와 각 서비스의 docker-compose 파일을 업데이트하여 cadvisor와 Prometheus를 pull/start 및 스크래핑하도록 구성합니다.

Changes

Cohort / File(s) Summary
배포 워크플로우
.github/workflows/cd-manual-o.yml, .github/workflows/cd-manual.yml
배포 단계의 docker compose pulldocker compose up 대상에 cadvisor를 promtail과 함께 추가. 일부 공백 정리.
백엔드/큐: cadvisor 추가
backend/docker-compose.yml, queue-backend/docker-compose.yml
각 환경에 cadvisor 서비스 추가(v0.47.2, 포트 8080, 호스트 파일시스템 및 Docker 메타데이터 읽기 전용 마운트, 네트워크 연결).
모니터링 스택: Prometheus 추가
monitoring/docker-compose.yml, monitoring/prometheus/prometheus.yml
Prometheus 서비스 추가(prom/prometheus:v2.45.0), prometheus_data 볼륨 선언 및 Prometheus 설정 파일(192.168.0.7:8080, 192.168.0.8:8080 대상 스크랩) 추가.

Sequence Diagram(s)

sequenceDiagram
  participant GH as GH Actions (deploy)
  participant DC as Docker Compose (hosts)
  participant Cd as cadvisor (backend/queue)
  participant Pr as Prometheus
  GH->>DC: docker compose pull (includes cadvisor, promtail, prometheus)
  GH->>DC: docker compose up (start services)
  DC->>Cd: start cadvisor (expose :8080, mount host fs)
  DC->>Pr: start prometheus
  Pr->>Cd: HTTP /metrics scrape (192.168.0.7/192.168.0.8:8080)
Loading

(노트: 다이어그램은 흐름 요약이며 세부 설정은 compose 및 prometheus.yml 참조)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10분

Possibly related PRs

주요 내용

모니터링 인프라를 추가하여 cadvisor 컨테이너 메트릭 수집과 Prometheus 메트릭 스크래핑 기능을 구현합니다. 배포 워크플로우 업데이트, docker-compose 파일 확장, Prometheus 설정 파일 추가를 포함합니다.

변경 사항

cohort / 파일 요약
배포 워크플로우
.github/workflows/cd-manual-o.yml, .github/workflows/cd-manual.yml
배포 단계에서 cadvisor 서비스를 promtail과 함께 pull/start하도록 수정. docker image prune 명령어 제거로 이미지 정리 단계 변경.
백엔드 모니터링 서비스
backend/docker-compose.yml, queue-backend/docker-compose.yml
cadvisor v0.47.2 서비스 추가. 포트 8080 노출, 호스트 파일시스템 및 Docker 메타데이터 읽기 전용 마운트, 각각의 네트워크에 연결.
모니터링 인프라
monitoring/docker-compose.yml, monitoring/prometheus/prometheus.yml
Prometheus v2.45.0 서비스 및 prometheus_data 영구 볼륨 추가. Prometheus 설정에서 192.168.0.7:8080(백엔드), 192.168.0.8:8080(큐) 메트릭 수집 설정.

코드 리뷰 예상 시간

🎯 2 (Simple) | ⏱️ ~10분

관련 PR 목록

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning PR 설명이 필수 섹션 대부분이 누락되어 있고, 제공된 내용이 매우 불완전합니다. 🧭 Summary, 🛠 개발 기능, 🧩 주요 고민과 해결 방법, 🔍 리뷰 포인트 섹션을 추가하고, 🔗 Linked Issue 섹션에 관련 이슈 번호를 명시해주세요.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 제목은 PR의 주요 변경 내용(프로메테우스 도입)을 명확하게 요약하고 있으며, 변경 사항과 직접 관련이 있습니다.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch backend-182

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In @.github/workflows/cd-manual-o.yml:
- Around line 131-137: The disk-clean command is currently commented out because
it's on the same line as a comment; separate the comment from the actual command
so the prune runs. Locate the block around the exported vars and the docker
compose up (references: IMAGE_TAG, NCP_REGISTRY_URL, SERVICE_NAME, LOKI_URL,
docker compose up) and move or split the trailing comment so the actual command
docker image prune -af --filter "until=24h" is on its own line and not preceded
by a comment marker, ensuring it executes after the compose up step.

In @.github/workflows/cd-manual.yml:
- Around line 222-227: The line combining comment and command ("# 이전 이미지 정리     
docker image prune -f") prevents docker image prune from running; split them so
the comment is on its own line and the command docker image prune -f is on the
next line (placed after docker compose up -d --no-build queue-backend promtail
cadvisor) so the prune actually executes.

In `@backend/docker-compose.yml`:
- Around line 88-101: The cadvisor service currently exposes 8080 to all
interfaces via the ports mapping "8080:8080", which is insecure; update the
cadvisor service (service name: cadvisor, container_name: cadvisor) to bind the
host-side port to a private IP by replacing the ports entry with a
host-IP-prefixed mapping that uses an env var (e.g.,
${CADVISOR_BIND_IP:-127.0.0.1}:8080:8080) so you can set CADVISOR_BIND_IP to a
private address in production, or remove the ports mapping entirely and restrict
access via host firewall rules or network configuration instead.

In `@monitoring/docker-compose.yml`:
- Around line 45-60: Update the prometheus service image in the
docker-compose.yml by changing the image reference in the prometheus service
(the "prometheus" service block and its image field currently set to
prom/prometheus:v2.45.0) to a supported release (e.g., prom/prometheus:v3.8.1 or
the v3.5 LTS tag) and redeploy; ensure you also review any Prometheus
config/flags for breaking changes between v2.x and v3.x and run the prometheus
container locally to validate that the existing volumes, command flags (e.g.,
--config.file, --storage.tsdb.path, --web.console.*) and mounted prometheus.yml
remain compatible after the version bump.

In `@monitoring/prometheus/prometheus.yml`:
- Around line 1-11: 현재 prometheus.yml의 backend-metrics 및 queue-metrics
job_name들이 고정 IP(192.168.0.7/0.8)를 static_configs.targets에 사용하고 있어 호스트 IP 변경 시
스크랩이 중단됩니다; 수정 방법은 backend-metrics와 queue-metrics의 static_configs.targets 대신 DNS
호스트네임(예: ticket.example.local, queue.example.local)으로 교체하거나, static_configs 블록을
file_sd_configs로 대체하여 별도 targets 파일(예: prometheus_targets/*.json 또는 .yml)을 참조하도록
변경하고 해당 targets 파일에 서비스별 엔드포인트를 유지·관리하도록 하세요; 또한 file_sd_configs를 선택하면
prometheus 서버 재시작 없이 targets 파일 업데이트만으로 변경사항이 반영되도록 구성(reload) 하는 절차를 병행하세요.

In `@queue-backend/docker-compose.yml`:
- Around line 59-72: The cadvisor service exposes port 8080 publicly; restrict
binding to localhost or a private interface and/or remove public port mapping:
update the cadvisor service (the cadvisor block and its ports entry) to bind to
127.0.0.1 (e.g., change the ports mapping from "8080:8080" to
"127.0.0.1:8080:8080") or remove the ports section entirely and instead set the
CADVISOR_BIND_IP environment variable to a private IP to limit exposure; ensure
the cadvisor container remains on the internal queue-network only if remote
access is required via a secured bastion.

Comment on lines +88 to +101
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.2
container_name: cadvisor
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
networks:
- backend-network
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the docker-compose.yml file to understand the full context
find . -type f -name "docker-compose.yml" | head -20

Repository: boostcampwm2025/web10-beastcamp

Length of output: 221


🏁 Script executed:

# Once we find the file, let's examine it
cat -n backend/docker-compose.yml | head -120

Repository: boostcampwm2025/web10-beastcamp

Length of output: 3166


🌐 Web query:

cAdvisor metrics endpoint security best practices authentication

💡 Result:

cAdvisor /metrics endpoint security & authentication best practices

  1. Do not expose cAdvisor endpoints publicly (Internet-facing). cAdvisor commonly runs without authentication by default, which can leak sensitive host/container details. Enforce network-level restriction as the primary control (firewalls / security groups / private networks). [1]

  2. Don’t rely on cAdvisor “Web UI auth” to protect /metrics. Google explicitly notes that Web UI authentication protects only /containers, not other HTTP endpoints like /metrics and /api/...; therefore, treat /metrics as unauthenticated unless you add separate controls. [2]

  3. Put /metrics behind an authenticating reverse proxy (most common approach). Terminate TLS and enforce auth (mTLS, OIDC, Basic Auth, etc.) at a proxy (Nginx/Envoy/Traefik/oauth2-proxy), and only allow Prometheus/scrapers to reach it. This compensates for the limitation in (2). [2]

  4. If running cAdvisor directly, bind and expose it minimally.

    • Bind only to a management network / localhost where possible (avoid 0.0.0.0).
    • Change the default port and path if it reduces accidental exposure and helps your governance. [2]
  5. Use certificate-based access controls where supported in your build. cAdvisor includes flags for certificate-based authentication via --collector_cert / --collector_key (availability/behavior depends on build/version), but many teams still front it with a proxy for consistent authn/z. [3]

  6. Reduce sensitive data leakage in metrics.

    • Disable exporting container labels & environment variables as Prometheus labels unless you explicitly need them (--store_container_labels=false, then allowlist only what you need). This mitigates accidental leakage of secrets via env/labels. [4][5]
  7. Prefer Kubernetes-native secure scraping (recommended in clusters): scrape kubelet over the authenticated port.

    • Kubelet exposes cAdvisor metrics at /metrics/cadvisor. [6]
    • Use kubelet authentication & authorization (token webhook + webhook authz / RBAC) rather than any unauthenticated endpoint. [7]
    • Disable kubelet read-only port 10255 (unauthenticated) and migrate to the secure, authenticated kubelet port 10250. [8]
  8. Least privilege for scrapers. If using RBAC, grant only what Prometheus needs to read metrics endpoints (non-resource URL access such as GET /metrics as appropriate for your setup). [6]

Sources:
[1] Tenable – cAdvisor Unauthenticated Access (tenable.com)
[2] Trend Micro – Observability Exposed: Exploring Risks in Cloud-Native Metrics (cAdvisor Web UI auth does not protect /metrics) (trendmicro.com)
[3] Debian Manpages – cadvisor flags including -collector_cert / -collector_key (hexmos.com)
[4] Trend Micro – cAdvisor metrics can expose container labels; risk discussion (trendmicro.com)
[5] Debian Manpages – -store_container_labels / allowlisting labels behavior (manpages.debian.org)
[6] Kubernetes docs – kubelet metrics endpoints including /metrics/cadvisor and RBAC note (v1-34.docs.kubernetes.io)
[7] Kubernetes docs – kubelet authentication/authorization (token webhook, authz modes) (kubernetes.io)
[8] Google Cloud – disable kubelet read-only port 10255; use secure 10250 (cloud.google.com)

Citations:


🏁 Script executed:

# Check if there are any other monitoring/security configs related to cAdvisor
rg -i "cadvisor|8080" --type yaml --type yml -B2 -A2

Repository: boostcampwm2025/web10-beastcamp

Length of output: 102


🏁 Script executed:

# Look for any environment variable definitions or security policies
find . -type f \( -name ".env*" -o -name "*security*" -o -name "*policy*" \) | head -20

Repository: boostcampwm2025/web10-beastcamp

Length of output: 111


🏁 Script executed:

# Check for cAdvisor and port references with correct file type
rg -i "cadvisor|8080" --type yaml -B2 -A2

Repository: boostcampwm2025/web10-beastcamp

Length of output: 1587


🏁 Script executed:

# Check if there's any reverse proxy or auth layer configuration
rg -i "nginx|traefik|proxy|auth" --type yaml -B1 -A1

Repository: boostcampwm2025/web10-beastcamp

Length of output: 3342


cAdvisor 포트(8080) 외부 노출 보안 위험

메트릭 엔드포인트는 인증이 없어서 네트워크에 노출될 경우 보안 위험이 있습니다. 사설 인터페이스 바인딩 또는 방화벽 제한을 적용하세요.

🔒 권장 설정: IP 바인딩
    ports:
-     - "8080:8080"
+     - "${CADVISOR_BIND_IP:-127.0.0.1}:8080:8080"

운영 환경에서는 CADVISOR_BIND_IP를 사설 IP(예: 192.168.0.7)로 지정하세요.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.2
container_name: cadvisor
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
networks:
- backend-network
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.2
container_name: cadvisor
restart: unless-stopped
ports:
- "${CADVISOR_BIND_IP:-127.0.0.1}:8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
networks:
- backend-network
🤖 Prompt for AI Agents
In `@backend/docker-compose.yml` around lines 88 - 101, The cadvisor service
currently exposes 8080 to all interfaces via the ports mapping "8080:8080",
which is insecure; update the cadvisor service (service name: cadvisor,
container_name: cadvisor) to bind the host-side port to a private IP by
replacing the ports entry with a host-IP-prefixed mapping that uses an env var
(e.g., ${CADVISOR_BIND_IP:-127.0.0.1}:8080:8080) so you can set CADVISOR_BIND_IP
to a private address in production, or remove the ports mapping entirely and
restrict access via host firewall rules or network configuration instead.

Comment on lines +45 to +60
prometheus:
image: prom/prometheus:v2.45.0
container_name: prometheus
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
ports:
- "9090:9090"
networks:
- monitoring-network
restart: unless-stopped
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Prometheus latest stable version 2024 2025

💡 Result:

  • Latest stable Prometheus release in 2025: v3.8.1 (released 2025-12-16) — shown as the latest (non–pre-release) on the official GitHub releases page. [1]
  • Latest stable Prometheus release in 2024: v3.0.0 (released 2024-11-14). [2]
  • If you specifically want an LTS (long-term support) line: Prometheus 3.5 is the 2025 LTS, supported until 2026-07-31. [3]

Sources: [1] [2] [3]


🌐 Web query:

Prometheus v2.45.0 CVE security vulnerabilities

💡 Result:

Prometheus v2.45.0 — known CVE status

1) CVEs in Prometheus server itself (upstream advisories)

The Prometheus upstream repo currently lists two Prometheus-server security advisories with CVEs, but neither affects v2.45.0:

  • CVE-2022-46146 (basic-auth bypass via exporter-toolkit): affected >=2.24.1 <2.37.4 and >=2.28.0 <2.40.42.45.0 not affected. [1]
  • CVE-2021-29622 (open redirect under /new): affected >2.23.0 <2.26.1 and 2.27.02.45.0 not affected. [2]

2) CVEs commonly flagged in Prometheus 2.45.0 builds/images (dependency / packaging findings)

Even if Prometheus 2.45.0 isn’t directly vulnerable per upstream advisories, security scanners often report CVEs coming from bundled Go modules and/or container base images. For example, a Trivy scan report for Prometheus v2.45.0 shows CVE-2023-2253 (via github.com/docker/distribution). [3]

Additionally, some downstream “prometheus-2.45” packages/images track multiple CVEs and advisories (package-specific), with fixed versions shown as newer package revisions (example: fixes in 2.45.3-r2 / 2.45.3-r3). [4]

3) Related (but separate) component CVE you may be seeing

If your environment also runs Alertmanager, note CVE-2023-40577 (XSS in Alertmanager; fixed in Alertmanager 0.25.1 per NVD). This is not a Prometheus-server CVE, but it often appears in monitoring stack scans. [5]

What to do

  • If you must stay on the 2.45 LTS line, prefer upgrading to a newer 2.45.x build/package that includes dependency refreshes (your distro/image vendor may publish patched rebuilds). [4]
  • If scan output can be shared (image digest, Go version, and the exact CVE list), I can map each CVE to upstream Prometheus vs dependency vs base image and tell you the cleanest upgrade/rebuild path.

Sources: [1] GitHub advisory GHSA-4v48-4q5m-8vx4 (CVE-2022-46146) • [2] GitHub advisory GHSA-vx57-7f4q-fpc7 (CVE-2021-29622) • [3] prometheus/prometheus issue #13204 (scanner findings on v2.45.0) • [4] Snyk “prometheus-2.45 vulnerabilities” (package-specific) • [5] NVD entry for CVE-2023-40577


Prometheus 버전을 v3.8.1(최신 안정 버전)로 업그레이드해야 합니다.

v2.45.0은 2023년 릴리스로 현재 v3.8.1(2025-12-16)보다 2개 버전 뒤처져 있으며, 2.x 라인은 더 이상 활발한 유지보수를 받지 않습니다. 업스트림 Prometheus는 v2.45.0을 직접 영향하는 CVE를 보고하지 않으나, 컨테이너 이미지의 번들 의존성(예: docker/distribution에서 CVE-2023-2253)에 취약성이 존재합니다.

최신 안정 버전 v3.8.1로 업그레이드하거나, 장기 지원이 필요한 경우 v3.5 LTS(2026-07-31까지 지원)로 업그레이드하세요.

🤖 Prompt for AI Agents
In `@monitoring/docker-compose.yml` around lines 45 - 60, Update the prometheus
service image in the docker-compose.yml by changing the image reference in the
prometheus service (the "prometheus" service block and its image field currently
set to prom/prometheus:v2.45.0) to a supported release (e.g.,
prom/prometheus:v3.8.1 or the v3.5 LTS tag) and redeploy; ensure you also review
any Prometheus config/flags for breaking changes between v2.x and v3.x and run
the prometheus container locally to validate that the existing volumes, command
flags (e.g., --config.file, --storage.tsdb.path, --web.console.*) and mounted
prometheus.yml remain compatible after the version bump.

Comment on lines +1 to +11
global:
scrape_interval: 15s

scrape_configs:
- job_name: 'backend-metrics'
static_configs:
- targets: ['192.168.0.7:8080'] # ticket-server, api-server instance

- job_name: 'queue-metrics'
static_configs:
- targets: ['192.168.0.8:8080'] # queue-server instance
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

고정 IP 타겟은 운영 리스크가 큽니다.

호스트 IP 변경 시 스크랩이 바로 깨집니다. 운영 유연성을 위해 DNS 기반 타겟 또는 file_sd_configs로 분리된 타겟 파일을 고려해 주세요.

🤖 Prompt for AI Agents
In `@monitoring/prometheus/prometheus.yml` around lines 1 - 11, 현재 prometheus.yml의
backend-metrics 및 queue-metrics job_name들이 고정 IP(192.168.0.7/0.8)를
static_configs.targets에 사용하고 있어 호스트 IP 변경 시 스크랩이 중단됩니다; 수정 방법은 backend-metrics와
queue-metrics의 static_configs.targets 대신 DNS 호스트네임(예: ticket.example.local,
queue.example.local)으로 교체하거나, static_configs 블록을 file_sd_configs로 대체하여 별도
targets 파일(예: prometheus_targets/*.json 또는 .yml)을 참조하도록 변경하고 해당 targets 파일에 서비스별
엔드포인트를 유지·관리하도록 하세요; 또한 file_sd_configs를 선택하면 prometheus 서버 재시작 없이 targets 파일
업데이트만으로 변경사항이 반영되도록 구성(reload) 하는 절차를 병행하세요.

Comment on lines +59 to +72
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.2
container_name: cadvisor-queue
restart: unless-stopped
ports:
- '8080:8080'
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
networks:
- queue-network
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

cAdvisor metrics endpoint authentication security best practices

💡 Result:

Key point: cAdvisor’s built-in Web UI auth does not secure /metrics

cAdvisor’s HTTP basic/digest “Web UI authentication” only protects the UI endpoint(s) (notably /containers) and does not protect other endpoints such as /metrics and /api/...; these endpoints can expose sensitive information, so they should not be publicly exposed. [1]

Best-practice ways to secure the cAdvisor metrics endpoint

  1. Do not expose cAdvisor to the public internet

    • Keep it on a private network/VPC, bind to a non-public interface, and restrict inbound access with security groups/firewalls / Kubernetes NetworkPolicies. This is the primary recommendation implied by cAdvisor’s own warning about sensitive exposure. [1]
  2. Put authentication + TLS in front of cAdvisor (reverse proxy / ingress / service mesh)

    • Because cAdvisor’s built-in UI auth doesn’t cover /metrics, protect /metrics at the edge (e.g., NGINX/Envoy/Traefik, an Ingress controller, or a mesh sidecar) using one of:
      • mTLS (client certificates) (strongest for machine-to-machine)
      • TLS + Basic Auth (acceptable in smaller setups)
      • IP allowlisting (as an additional control, not a replacement)
    • Prometheus and exporters commonly rely on TLS (including optional client cert auth) as the security mechanism rather than “app-level login.” [2]
  3. Use Prometheus scrape-side TLS settings (when your proxy/mesh enforces TLS/mTLS)

    • Prometheus supports configuring tls_config (CA, client cert/key, min TLS version, etc.) per scrape job/target. [3]
    • Prometheus also documents TLS enablement patterns for its web endpoints and (separately) notes TLS support for scrapes. [4]
  4. Reduce sensitive data exposed in the metrics payload

    • Limit or disable label/metadata expansion that can leak information:
      • Avoid turning container labels and environment variables into Prometheus labels unless you truly need them (--store_container_labels, --whitelisted_container_labels, --env_metadata_whitelist). [5]
    • Disable metric groups you don’t need (--disable_metrics / --enable_metrics) to reduce both exposure and attack surface. [5]
  5. Avoid “wide exposure” Kubernetes patterns

    • Don’t publish cAdvisor via NodePort/LoadBalancer unless you also enforce strong authn/z (mTLS) and strict source restrictions.
    • Prefer scraping through cluster-internal routing with tight policy controls.

Practical “default secure” pattern

Prometheus → (mTLS) → reverse-proxy/service-mesh → cAdvisor (cluster-internal only), plus minimum necessary metrics/labels enabled. This aligns with cAdvisor’s warning not to expose /metrics publicly and with Prometheus’s exporter security model leaning on TLS/mTLS. [1][2]

Sources: [1] [2] [3] [4] [5]


🌐 Web query:

cAdvisor port 8080 security exposure risks unauthenticated

💡 Result:

Exposing cAdvisor on TCP/8080 to untrusted networks is risky because most of what it serves is unauthenticated by default, and it can leak high-value host/container metadata that enables recon and follow-on attacks.

What’s exposed when 8080 is reachable without auth

  • Prometheus metrics (/metrics) and REST APIs (/api/...) can be accessed without the Web UI auth controls. cAdvisor’s own docs note that Web UI auth only protects the /containers UI endpoint, not /api/... or /metrics, and warns these endpoints may expose sensitive information and shouldn’t be public. [2]
  • Sensitive data in metrics can include:
    • container names, images, node/host details, resource usage patterns;
    • container labels by default, which in real deployments have contained security-relevant configuration (example: Traefik routing rules / middleware config) and can be leveraged to bypass intended access controls. [3]
    • potentially environment variables, if enabled/whitelisted for collection, which can include credentials/tokens if misused. (This is controlled by flags such as --env_metadata_whitelist and label-related options.) [1]

Why unauthenticated exposure matters (practical impact)

  • Recon & targeting: attackers can inventory your workloads (what runs where, versions/images, naming conventions, topology hints) and identify vulnerable components faster. [2][4]
  • Secret/config leakage via labels/metrics: labels (and sometimes env-derived metadata, if enabled) can unintentionally disclose credentials, hashes, routing rules, internal hostnames, and other operational secrets. [1][3]
  • Host risk increases because cAdvisor is often deployed with broad privileges/host mounts. The official quick-start runs the container privileged and mounts host paths (including / as /rootfs) while publishing 8080. If an attacker later finds a cAdvisor or adjacent-component exploit, the blast radius is worse in that posture. [5]

Mitigations (recommended)

  • Do not expose cAdvisor publicly; restrict by network policy / firewall / security groups / private VPC routing. [2]
  • If you must expose it, put it behind an authenticating reverse proxy (because cAdvisor’s Web UI auth does not cover /metrics and /api). [2]
  • Reduce what can leak:
    • disable unneeded metrics via --disable_metrics; change endpoints/port (--prometheus_endpoint, --port). [2][1]
    • prevent label/env leakage: use --store_container_labels=false, --whitelisted_container_labels, and tightly control --env_metadata_whitelist. [1][3]
  • Prefer binding to limited interfaces where possible (--listen_ip; defaults to all IPs). [1]

Bottom line: an unauthenticated cAdvisor 8080 exposure is typically an information-disclosure foothold that materially improves an attacker’s ability to compromise the environment, and it should be treated as a security finding, not just “metrics leakage.” [2][4]


🌐 Web query:

cAdvisor v0.47.2 security guidelines docker deployment

💡 Result:

Baseline (what upstream shows for Docker)

cAdvisor’s upstream “Quick Start” Docker deployment mounts large parts of the host filesystem (rootfs, /sys, Docker state) and runs the container privileged, plus access to /dev/kmsg. (github.com)

That combination is powerful (it’s why cAdvisor can see everything), but it also greatly increases host risk—something the community has explicitly raised (host / bind-mount + privileged + root, exposed web socket). (github.com)


Security guidelines for a Docker deployment (cAdvisor v0.47.2)

1) Do not expose cAdvisor to untrusted networks

  • Bind only to localhost (or a management VLAN) and firewall it.
  • cAdvisor’s metrics/UI endpoint has had real-world risks (for example, leaking sensitive container labels/config via Prometheus metrics), and it generally should not be Internet-reachable. (trendmicro.com)

Practical step: publish only on loopback:

-p 127.0.0.1:8080:8080

cAdvisor also supports binding to a specific IP via --listen_ip. (sources.debian.org)


2) Put authentication + TLS in front (reverse proxy), or use cAdvisor’s HTTP auth flags

cAdvisor can be deployed behind a reverse proxy that enforces:

  • TLS
  • authentication/authorization (SSO, mTLS, basic auth, etc.)
  • IP allowlisting

There are also cAdvisor flags for HTTP auth (for example --http_auth_file, --http_digest_file). (sources.debian.org)


3) Treat --privileged as a last resort (but know upstream uses it)

Privileged containers override/disable multiple kernel security controls (seccomp/AppArmor/SELinux confinement, etc.). (kubernetes.io)

Upstream’s Docker example uses --privileged because it makes host introspection “just work,” but from a hardening standpoint:

  • try running without --privileged first (expect you may lose some visibility)
  • if you must use --privileged, run cAdvisor only on trusted nodes, tightly restrict network access, and monitor the container like a host agent

4) Keep host mounts read-only and minimize them where possible

Upstream uses read-only mounts for most paths; keep them :ro wherever possible. (github.com)
Be especially cautious with mounting host / into the container (--volume=/:/rootfs:ro)—it’s a core concern raised by users reviewing the official guidance. (github.com)


5) Use an immutable container filesystem when feasible

Where compatible, run with a read-only root filesystem (--read-only) and add tmpfs mounts only where needed. This is a common container hardening control (CIS guidance). (docs.datadoghq.com)


A “safer-by-default” example for v0.47.2

This keeps the upstream-required host visibility, but reduces exposure by binding to localhost and adding basic hardening knobs:

docker run -d --name=cadvisor \
  -p 127.0.0.1:8080:8080 \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --privileged \
  --device=/dev/kmsg \
  gcr.io/cadvisor/cadvisor:v0.47.2 \
  --listen_ip=127.0.0.1

(Those volumes/privileged mode match upstream’s documented Docker approach; the --listen_ip flag exists in cAdvisor’s CLI.) (github.com)

Citations:


cAdvisor 메트릭 엔드포인트(포트 8080)가 인증 없이 노출되어 있습니다.

cAdvisor의 내장 Web UI 인증은 /metrics/api 엔드포인트를 보호하지 않으므로 공개 네트워크에 노출되면 보안 위험입니다. 컨테이너 라벨, 이미지 정보, 호스트 메타데이터 등 민감한 정보가 유출될 수 있으며 공격자의 정찰(reconnaissance)에 악용될 수 있습니다.

localhost 또는 사설 인터페이스로만 바인딩하도록 제한하세요.

권장 수정 방법
  ports:
-   - '8080:8080'
+   - '127.0.0.1:8080:8080'

또는 환경 변수로 관리하려면:

  ports:
-   - '8080:8080'
+   - "${CADVISOR_BIND_IP:-127.0.0.1}:8080:8080"

운영 환경에서 다른 호스트에서 접근이 필요한 경우 CADVISOR_BIND_IP를 사설 네트워크 IP(예: 192.168.0.8)로 설정하세요.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.2
container_name: cadvisor-queue
restart: unless-stopped
ports:
- '8080:8080'
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
networks:
- queue-network
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.47.2
container_name: cadvisor-queue
restart: unless-stopped
ports:
- '127.0.0.1:8080:8080'
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
networks:
- queue-network
🤖 Prompt for AI Agents
In `@queue-backend/docker-compose.yml` around lines 59 - 72, The cadvisor service
exposes port 8080 publicly; restrict binding to localhost or a private interface
and/or remove public port mapping: update the cadvisor service (the cadvisor
block and its ports entry) to bind to 127.0.0.1 (e.g., change the ports mapping
from "8080:8080" to "127.0.0.1:8080:8080") or remove the ports section entirely
and instead set the CADVISOR_BIND_IP environment variable to a private IP to
limit exposure; ensure the cadvisor container remains on the internal
queue-network only if remote access is required via a secured bastion.

@shininghyunho shininghyunho merged commit 8d398aa into dev Feb 4, 2026
5 checks passed
@viixix viixix deleted the backend-182 branch February 5, 2026 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant