Skip to content

Commit a62f8d9

Browse files
authored
Merge pull request #28 from Shayan-Ghani/fastapi
Migrate From Flask To FastApi
2 parents cfe5076 + 1bdc15e commit a62f8d9

File tree

10 files changed

+222
-131
lines changed

10 files changed

+222
-131
lines changed

CHANGELOG.md

Lines changed: 35 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,40 @@
11
# ChangeLog for CXP
22

3+
---
4+
## [1.2.0] - 2025-05-30
5+
6+
### Changed
7+
- **Internal framework refactor:** Migrated from **Flask** to **FastAPI** for improved asynchronous handling and performance.
8+
- Updated operational dependencies:
9+
- Added `fastapi`, `uvicorn`
10+
- Removed `flask`, `gunicorn`
11+
12+
### Notes
13+
- **No changes** to Prometheus metrics endpoints, names, labels, or scrape behavior.
14+
- Existing Prometheus scrapers, dashboards, and alerting rules will continue to work as-is.
15+
- The internal implementation is now fully asynchronous with FastAPI, potentially improving concurrent scrape handling under heavy load.
16+
- Logging and startup messages will differ due to the new framework and ASGI server (`uvicorn`).
17+
- adjust the following settings for `uvicorn` as environment variables:
18+
- HOST
19+
- PORT
20+
- WORKERS (Default : 3)
21+
- LOG_LEVEL=(Default : warning)
22+
23+
⚠️ **Breaking operational change:** if your deployment or runtime environment specifically depends on Flask or Gunicorn, you'll need to adjust service definitions accordingly.
24+
25+
---
26+
27+
## [1.1.2-1.1.4] 2025-05-05
28+
29+
## Key points
30+
- added Github actions deployment option
31+
- this version makes the code more flexible against vulnerability dependency risks PRs.
32+
33+
**check out README.MD, Deploy with Github Actions to make use of the new changes.**
34+
35+
36+
---
37+
338
Version : 1.1.1
439

540
## Key points
@@ -21,12 +56,3 @@ Version : 1.1.1
2156
- `cxp_network_rx_bytes_total`: Total number of bytes received over the network
2257
- `cxp_network_tx_bytes_total`: Total number of bytes transmitted over the network
2358

24-
# ChangeLog for CXP
25-
26-
Version : 1.1.2-1.1.4
27-
28-
## Key points
29-
- added Github actions deployment option
30-
- this version makes the code more flexible against vulnerability dependency risks PRs.
31-
32-
**check out README.MD, Deploy with Github Actions to make use of the new changes.**

Dockerfile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@ FROM python:3.10-slim-buster
22

33
LABEL maintainer="Shayan Ghani <[email protected]>"
44

5-
ENV CONTAINER_EXPORTER_ENV=production CONTAINER_EXPORTER_DEBUG=0 CONTAINER_EXPORTER_PORT=8000
5+
ENV CONTAINER_EXPORTER_ENV=production \
6+
CONTAINER_EXPORTER_DEBUG=0 \
7+
CONTAINER_EXPORTER_PORT=8000 \
8+
PYTHONUNBUFFERED=1
69

710
EXPOSE 8000
811

README.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -99,13 +99,19 @@ kill -9 <PID>
9999
```
100100
Replace `<PID>` with the pid of ./start.sh script.
101101

102-
#### 🚢 Run With A Custom Port:
102+
#### 🚢 Run With A Custom Parameters:
103+
104+
- adjust the following settings for `uvicorn` as environment variables:
105+
- HOST (Default: 0.0.0.0)
106+
- PORT (Default: 8000)
107+
- WORKERS (Default : 3)
108+
- LOG_LEVEL (Default : warning)
109+
110+
Example:
103111
```bash
104-
./start.sh <your custome port> &
112+
PORT="8000" ./start.sh <your custome port> &
105113
```
106114

107-
Change `<your custom port>` with a port of your choice.
108-
109115
### 🔥 Add CXP to Prometheus
110116
- Edit your `prometheus.yml` file and add the address of container-exporter in scrape_configs:
111117

configs/__init__.py

Lines changed: 0 additions & 1 deletion
This file was deleted.

configs/config.py

Lines changed: 0 additions & 6 deletions
This file was deleted.

container_exporter.py

Lines changed: 89 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -1,95 +1,100 @@
1-
from asyncio import gather, new_event_loop, wait
2-
from aiodocker import Docker
3-
from docker import from_env as docker_env
1+
from asyncio import gather
2+
from aiodocker import Docker
3+
from aiodocker.containers import DockerContainer
44
from stats import get_docker_stats as stat
5-
from prometheus_client import Gauge, Counter
5+
from prometheus_client import Gauge, Counter, CONTENT_TYPE_LATEST
66
from prometheus_client.exposition import generate_latest
7-
from flask import Flask, Response, request
8-
from configs import config
7+
from fastapi import FastAPI
8+
from fastapi.responses import PlainTextResponse
9+
from contextlib import asynccontextmanager
10+
from utils.metrics import PromMetric, prune_stale_metrics, flush_metric_labels
11+
from logging import basicConfig, error, ERROR
12+
13+
docker_client: Docker
14+
15+
@asynccontextmanager
16+
async def lifespan(app: FastAPI):
17+
global docker_client
18+
docker_client = Docker()
19+
20+
yield
921

10-
app = Flask(__name__)
22+
await docker_client.close()
1123

24+
app = FastAPI(lifespan=lifespan)
1225

13-
# Create Prometheus gauge metrics
14-
container_status = Gauge('cxp_container_status', 'Docker container status (1 = running, 0 = not running)', ['container_name'])
15-
container_cpu_percentage = Gauge('cxp_cpu_percentage', 'Docker container cpu usage', ['container_name'])
16-
container_memory_percentage = Gauge('cxp_memory_percentage', 'Docker container memory usage in percent', ['container_name'])
17-
container_memory_bytes_total = Gauge('cxp_memory_bytes_total', 'Docker container memory usage in bytes', ['container_name'])
26+
gauge_container_status = Gauge('cxp_container_status', 'Docker container status (1 = running, 0 = not running)', ['container_name'])
27+
gauge_cpu_percentage = Gauge('cxp_cpu_percentage', 'Docker container CPU usage', ['container_name'])
28+
gauge_memory_percentage = Gauge('cxp_memory_percentage', 'Docker container memory usage in percent', ['container_name'])
29+
gauge_memory_bytes = Gauge('cxp_memory_bytes_total', 'Docker container memory usage in bytes', ['container_name'])
1830

19-
disk_io_read_counter = Counter("cxp_disk_io_read_bytes_total", "Total number of bytes read from disk", ['container_name'])
20-
disk_io_write_counter = Counter("cxp_disk_io_write_bytes_total", "Total number of bytes written to disk", ['container_name'])
31+
counter_disk_read = Counter("cxp_disk_io_read_bytes_total", "Total bytes read from disk", ['container_name'])
32+
counter_disk_write = Counter("cxp_disk_io_write_bytes_total", "Total bytes written to disk", ['container_name'])
33+
counter_net_rx = Counter("cxp_network_rx_bytes_total", "Total bytes received over network", ['container_name'])
34+
counter_net_tx = Counter("cxp_network_tx_bytes_total", "Total bytes sent over network", ['container_name'])
2135

22-
network_rx_counter = Counter("cxp_network_rx_bytes_total", "Total number of bytes received over the network", ['container_name'])
23-
network_tx_counter = Counter("cxp_network_tx_bytes_total", "Total number of bytes transmitted over the network", ['container_name'])
24-
2536

26-
# get the data for running or not running(unhealthy) containers
27-
def get_containers(all=False):
28-
client = docker_env()
29-
return client.containers.list(all)
30-
31-
init_containers_names = [c.name for c in get_containers()]
32-
33-
# update container status whether they are running.
34-
def update_container_status(containers):
35-
for container in containers:
36-
if container.name in init_containers_names:
37-
container_status.labels(container_name=container.name).set(1 if container.status == "running" else 0)
38-
elif container.status == "running":
39-
container_status.labels(container_name=container.name).set(1)
40-
init_containers_names.append(container.name)
41-
42-
for container_name in init_containers_names:
43-
if container_name not in [c.name for c in containers]:
44-
container_status.labels(container_name=container_name).set(0)
45-
46-
47-
async def container_stats():
48-
docker = Docker()
49-
try:
50-
containers = await docker.containers.list()
51-
tasks = [stat.get_container_stats(container) for container in containers]
52-
all_stats = await gather(*tasks)
53-
for stats in all_stats:
54-
container_cpu_percentage.labels(container_name=stats[0]['name'][1:]).set(stat.calculate_cpu_percentage(stats[0]))
55-
container_memory_percentage.labels(container_name=stats[0]['name'][1:]).set(stat.calculate_memory_percentage(stats[0]))
56-
container_memory_bytes_total.labels(container_name=stats[0]['name'][1:]).set(stat.calculate_memory_bytes(stats[0]))
57-
disk_io_read_counter.labels(container_name=stats[0]['name'][1:]).inc(stat.calculate_disk_io(stats[0])[0])
58-
disk_io_write_counter.labels(container_name=stats[0]['name'][1:]).inc(stat.calculate_disk_io(stats[0])[1])
59-
network_rx_counter.labels(container_name=stats[0]['name'][1:]).inc(stat.calculate_network_io(stats[0])[0])
60-
network_tx_counter.labels(container_name=stats[0]['name'][1:]).inc(stat.calculate_network_io(stats[0])[1])
61-
finally:
62-
await docker.close()
63-
64-
metrics_names = [container_cpu_percentage, container_memory_percentage , container_memory_bytes_total , disk_io_read_counter , disk_io_write_counter , network_rx_counter , network_tx_counter ]
65-
66-
def flush_metric_labels(c):
67-
for container in c:
68-
if container.status != "running":
69-
for m in metrics_names:
70-
m.clear()
71-
72-
@app.route('/')
73-
def index():
74-
return "Welcome To CXP, Contianer Exporter For Prometheus."
75-
76-
@app.route('/metrics')
77-
def metrics():
78-
try:
79-
all_containers = get_containers(all=True)
80-
update_container_status(all_containers)
81-
flush_metric_labels(all_containers)
82-
loop = new_event_loop()
83-
t = [loop.create_task(container_stats())]
84-
loop.run_until_complete(wait(t))
85-
except Exception as e:
86-
return f"Error running script: {str(e)}"
37+
metrics_to_clear: list[PromMetric] = [gauge_cpu_percentage, gauge_memory_percentage, gauge_memory_bytes, counter_disk_read, counter_disk_write, counter_net_rx, counter_net_tx]
8738

88-
return Response(generate_latest(), mimetype='text/plain')
8939

90-
def create_app():
91-
app.config.from_object(config.Config)
92-
return app
9340

94-
if __name__ == "__main__":
95-
app.run('0.0.0.0', 8000)
41+
async def get_containers(all=False) -> list[DockerContainer]:
42+
return await docker_client.containers.list(all=all)
43+
44+
def update_container_status(running_containers:list[DockerContainer]):
45+
46+
current_names = [c._container.get("Names")[0][1:] for c in running_containers]
47+
for name in current_names:
48+
gauge_container_status.labels(container_name=name).set(1)
49+
50+
# Async metrics gathering
51+
async def container_stats( running_containers: list[DockerContainer]):
52+
tasks = [stat.get_container_stats(container) for container in running_containers]
53+
all_stats = await gather(*tasks)
54+
55+
for stats in all_stats:
56+
name = stats[0]['name'][1:]
57+
gauge_cpu_percentage.labels(container_name=name).set(stat.calculate_cpu_percentage(stats[0]))
58+
gauge_memory_percentage.labels(container_name=name).set(stat.calculate_memory_percentage(stats[0]))
59+
gauge_memory_bytes.labels(container_name=name).set(stat.calculate_memory_bytes(stats[0]))
60+
disk_read, disk_write = stat.calculate_disk_io(stats[0])
61+
net_rx, net_tx = stat.calculate_network_io(stats[0])
62+
63+
counter_disk_read.labels(container_name=name).inc(disk_read)
64+
counter_disk_write.labels(container_name=name).inc(disk_write)
65+
counter_net_rx.labels(container_name=name).inc(net_rx)
66+
counter_net_tx.labels(container_name=name).inc(net_tx)
67+
68+
# List of metrics we want to prune (performance counters)
69+
prunable_metrics: list[PromMetric] = [
70+
gauge_cpu_percentage, gauge_memory_percentage, gauge_memory_bytes,
71+
counter_disk_read, counter_disk_write, counter_net_rx, counter_net_tx
72+
]
73+
74+
# Metrics we want to always keep, and set to 0 instead
75+
persistent_metrics: list[PromMetric] = [gauge_container_status]
76+
77+
78+
@app.get("/")
79+
def root():
80+
return {"message": "Welcome to CXP, Container Exporter for Prometheus."}
81+
82+
@app.get("/metrics")
83+
async def metrics():
84+
try:
85+
running_containers = await get_containers()
86+
update_container_status(running_containers)
87+
prune_stale_metrics([c._container.get("Names")[0][1:] for c in running_containers], prunable_metrics, persistent_metrics)
88+
await container_stats(running_containers)
89+
return PlainTextResponse(
90+
content=generate_latest(),
91+
media_type=CONTENT_TYPE_LATEST
92+
)
93+
except Exception as e:
94+
basicConfig(
95+
level=ERROR,
96+
format='%(asctime)s ERROR %(message)s',
97+
datefmt='%Y-%m-%d %H:%M:%S'
98+
)
99+
error(str(e))
100+
return PlainTextResponse(f"Error running metrics collection: {str(e)}", status_code=500)

requirements.txt

Lines changed: 21 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,28 @@
11
aiodocker==0.21.0
2-
aiohttp>=3.9.0
3-
aiosignal==1.3.1
2+
aiohappyeyeballs==2.6.1
3+
aiohttp==3.12.4
4+
aiosignal==1.3.2
5+
annotated-types==0.7.0
6+
anyio==4.9.0
47
async-timeout==4.0.3
58
attrs==23.1.0
6-
blinker==1.6.2
79
certifi==2024.7.4
810
charset-normalizer==3.2.0
9-
click==8.1.7
10-
docker==6.1.3
11-
Flask==2.3.3
12-
frozenlist==1.4.0
13-
gunicorn==23.0.0
11+
click==8.2.1
12+
exceptiongroup==1.3.0
13+
fastapi==0.115.12
14+
frozenlist==1.6.0
15+
h11==0.16.0
1416
idna==3.7
15-
importlib-metadata==6.8.0
16-
itsdangerous==2.1.2
17-
jinja2>=3.1.3
18-
MarkupSafe==2.1.3
19-
multidict==6.0.4
20-
packaging==23.1
17+
multidict==6.4.4
2118
prometheus-client==0.17.1
22-
requests==2.31.0
23-
typing_extensions==4.8.0
24-
urllib3>=2.0.7
25-
websocket-client==1.6.2
26-
werkzeug>=2.3.8
27-
yarl==1.9.2
28-
zipp==3.19.1
29-
pip==23.3
19+
propcache==0.3.1
20+
pydantic==2.11.5
21+
pydantic_core==2.33.2
22+
sniffio==1.3.1
23+
starlette==0.46.2
24+
typing-inspection==0.4.1
25+
typing_extensions==4.13.2
26+
urllib3==2.4.0
27+
uvicorn==0.30.0
28+
yarl==1.20.0

start.sh

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,30 @@
11
#!/bin/sh
22

3-
port="${1:-8000}"
3+
# Configurable variables
4+
HOST=${HOST:-0.0.0.0}
5+
PORT=${PORT:-8000}
6+
WORKERS=${WORKERS:-3}
7+
LOG_LEVEL=${LOG_LEVEL:-warning}
48

5-
gunicorn -b 0.0.0.0:$port -w 3 --access-logfile - --error-logfile - --reload "container_exporter:create_app()"
9+
echo "Starting Container Exporter..."
10+
echo "Host: $HOST, Port: $PORT, Workers: $WORKERS, Log Level: $LOG_LEVEL"
611

12+
# Trap signals to shut down gracefully
13+
term_handler() {
14+
echo "SIGTERM received, shutting down..."
15+
kill -TERM "$child" 2>/dev/null
16+
wait "$child"
17+
exit 0
18+
}
19+
trap term_handler SIGTERM
20+
21+
while true; do
22+
uvicorn "container_exporter:app" \
23+
--host "$HOST" \
24+
--port "$PORT" \
25+
--workers "$WORKERS" \
26+
--log-level "$LOG_LEVEL"
27+
28+
echo "Uvicorn crashed with exit code $?. Restarting in 3 seconds..."
29+
sleep 3
30+
done

stats/get_docker_stats.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
1-
def calculate_cpu_percentage(stats) -> float:
1+
from aiodocker.docker import DockerContainer
2+
3+
def calculate_cpu_percentage(stats:dict) -> float:
24
cpu_percent = 0
35

46
cpu_delta = stats['cpu_stats']['cpu_usage']['total_usage'] - stats['precpu_stats']['cpu_usage']['total_usage']
7+
58
system_delta = stats['cpu_stats']['system_cpu_usage'] - stats['precpu_stats']['system_cpu_usage']
9+
610
number_cpus = stats['cpu_stats']['online_cpus']
711
if cpu_delta is not None and system_delta is not None and number_cpus is not None:
812
cpu_percent = (cpu_delta / system_delta) * number_cpus * 100.0
@@ -56,6 +60,6 @@ def calculate_network_io(stats) -> bytes:
5660

5761
return network_rx_bytes, network_tx_bytes
5862

59-
async def get_container_stats(container):
63+
async def get_container_stats(container:DockerContainer):
6064
stats = await container.stats(stream=False)
6165
return stats

0 commit comments

Comments
 (0)