Skip to content

Commit b652e5b

Browse files
committed
Rewrite README with full architecture, deployment and limitations
1 parent 6cca776 commit b652e5b

File tree

1 file changed

+224
-117
lines changed

1 file changed

+224
-117
lines changed

README.md

Lines changed: 224 additions & 117 deletions
Original file line numberDiff line numberDiff line change
@@ -1,156 +1,263 @@
1-
# Production-Style ML Inference API with FastAPI, Docker, Prometheus, Grafana, GitHub Actions, Kubernetes, HPA, and Ingress
2-
3-
## Project Summary
4-
5-
This project demonstrates how to build a production-style machine learning inference service from the ground up. It starts with offline model training and artifact serialization, then exposes the model through a FastAPI application with health checks, Prometheus metrics, automated tests, containerization, CI/CD, Kubernetes deployment, horizontal pod autoscaling, and ingress-based routing.
6-
7-
The goal of this project is not just to train a model, but to show the engineering practices required to operate a machine learning model as a service.
8-
9-
## What This Project Includes
10-
11-
- offline model training and artifact saving
12-
- FastAPI inference API
13-
- request and response validation with Pydantic
14-
- liveness and readiness endpoints
15-
- Prometheus metrics exposure
16-
- automated testing with pytest
17-
- Docker containerization
18-
- Docker Compose observability stack with Prometheus and Grafana
19-
- load testing with Locust
20-
- GitHub Actions CI/CD
21-
- Docker image publishing to GitHub Container Registry (GHCR)
22-
- Kubernetes deployment and service routing
23-
- resource requests and limits
24-
- Horizontal Pod Autoscaler (HPA)
25-
- ingress-based access using ingress-nginx
26-
27-
## Tech Stack
28-
29-
- Python
30-
- FastAPI
31-
- scikit-learn
32-
- NumPy
33-
- joblib
34-
- pytest
35-
- Docker
36-
- Docker Compose
37-
- Prometheus
38-
- Grafana
39-
- Locust
40-
- GitHub Actions
41-
- GitHub Container Registry (GHCR)
42-
- Kubernetes
43-
- ingress-nginx
1+
# ML Inference API --- Production-Style ML Inference Service
442

45-
## Architecture
3+
## Overview
464

47-
### Local and container architecture
5+
This project demonstrates how to take a trained machine learning model
6+
beyond notebook experimentation and operate it as a service.
487

49-
Client
50-
→ FastAPI API
51-
→ Model artifact
52-
`/metrics`
53-
→ Prometheus
54-
→ Grafana
8+
The system trains a model offline, serializes the artifact, exposes it
9+
through a FastAPI inference API, validates requests, emits Prometheus
10+
metrics, runs automated tests, packages the application with Docker,
11+
supports observability with Prometheus and Grafana, performs load
12+
testing with Locust, publishes container images to GitHub Container
13+
Registry (GHCR), and deploys the service through Kubernetes, ingress,
14+
and a public cloud web service.
5515

56-
### Kubernetes architecture
16+
This is a production-style project focused on engineering practice
17+
rather than only model training.
5718

58-
Client
59-
→ Ingress
60-
→ Kubernetes Service
61-
→ FastAPI Pods
62-
→ Model artifact
19+
------------------------------------------------------------------------
6320

64-
### Delivery architecture
21+
## Live Deployment
6522

66-
GitHub Push
67-
→ GitHub Actions
68-
→ Run Tests
69-
→ Build Docker Image
70-
→ Push to GHCR
71-
→ Kubernetes pulls image
23+
Public service https://ml-inference-api-tagq.onrender.com
7224

73-
## API Endpoints
25+
Swagger interface https://ml-inference-api-tagq.onrender.com/docs
26+
27+
Health endpoints GET /health/live GET /health/ready
28+
29+
Metrics GET /metrics
30+
31+
Prediction endpoint POST /predict
32+
33+
------------------------------------------------------------------------
34+
35+
## Project Objective
36+
37+
Most ML tutorials stop after training a model. Real systems require
38+
additional engineering layers including: - repeatable model packaging -
39+
API design and validation - automated testing - observability -
40+
containerization - deployment workflows - infrastructure exposure
41+
42+
This project demonstrates the path from: Trained model → API service →
43+
container → monitored deployment
44+
45+
------------------------------------------------------------------------
46+
47+
## Core Features
48+
49+
Machine Learning - offline model training with scikit-learn - serialized
50+
model artifact using joblib - reproducible artifact generation during
51+
Docker build
52+
53+
API - FastAPI inference service - request and response validation with
54+
Pydantic - health endpoints for liveness and readiness - automatic
55+
Swagger documentation
56+
57+
Testing - automated API tests using pytest - validation of prediction,
58+
health checks, and invalid payloads
59+
60+
Observability - Prometheus metrics exposure - Prometheus target
61+
validation - Grafana dashboard visualization - load testing with Locust
62+
63+
Containerization - Docker image build - container runtime validation -
64+
Docker Compose observability stack
65+
66+
Delivery and Registry - GitHub Actions CI pipeline - Docker image
67+
publishing to GHCR - remote container pull verification
68+
69+
Orchestration - Kubernetes deployment and service - resource requests
70+
and limits - rolling deployment strategy - Horizontal Pod Autoscaler
71+
(HPA) - ingress-nginx controller and ingress routing
72+
73+
Cloud Deployment - public Docker deployment on Render - successful
74+
application startup in managed environment - public API access
75+
76+
------------------------------------------------------------------------
77+
78+
## Technology Stack
79+
80+
Programming and ML - Python - scikit-learn - NumPy - joblib
81+
82+
API - FastAPI - Pydantic - Uvicorn
83+
84+
Testing - pytest
85+
86+
Containerization - Docker - Docker Compose
87+
88+
Observability - Prometheus - Grafana - Locust -
89+
prometheus-fastapi-instrumentator
90+
91+
CI/CD and Registry - GitHub Actions - GitHub Container Registry (GHCR)
92+
93+
Infrastructure - Kubernetes - ingress-nginx - Render
7494

75-
### `GET /health/live`
76-
Returns application liveness status.
95+
------------------------------------------------------------------------
7796

78-
Example response:
97+
## Architecture
98+
99+
Local and container flow Client → FastAPI API → Model Artifact → Metrics
100+
→ Prometheus → Grafana
79101

80-
```json
81-
{"status":"alive"}
102+
Delivery pipeline GitHub Push → GitHub Actions → Run Tests → Build
103+
Docker Image → Push to GHCR → Deployment platform pulls image
82104

105+
Kubernetes flow Client → Ingress → Kubernetes Service → FastAPI Pods →
106+
Model Artifact
83107

84-
GET /health/ready
108+
Public deployment flow GitHub Repository → Docker Build → Model Artifact
109+
Generated → Public Web Service
85110

86-
Returns readiness status after model loading.
111+
------------------------------------------------------------------------
112+
113+
## API Endpoints
87114

88-
Example response:
115+
GET /health/live\
116+
Returns liveness status.
89117

90-
{"status":"ready"}
91-
POST /predict
118+
Example { "status": "alive" }
92119

120+
GET /health/ready\
121+
Returns readiness status after model loads.
122+
123+
Example { "status": "ready" }
124+
125+
POST /predict\
93126
Runs inference using the trained model.
94127

95-
Example request:
128+
Example request { "features": \[5.1, 3.5, 1.4, 0.2\] }
96129

97-
{
98-
"features": [5.1, 3.5, 1.4, 0.2]
99-
}
130+
Example response { "prediction": 0 }
100131

101-
Example response:
132+
GET /metrics\
133+
Prometheus metrics endpoint.
102134

103-
{
104-
"prediction": 0
105-
}
135+
GET /docs\
136+
Swagger UI interface.
106137

107-
GET /health/live
108-
GET /health/ready
109-
POST /predict
110-
GET /metrics
111-
GET /docs
138+
------------------------------------------------------------------------
112139

113140
## Project Structure
114141

115142
ml_inference_api/
116-
├── .github/
117-
├── app/
118-
├── model/
119-
├── tests/
120-
├── monitoring/
121-
├── k8s/
122-
├── load_tests/
123-
├── scripts/
124-
├── docs/
125-
│ ├── evidence/
126-
│ ├── reports/
127-
│ └── architecture/
128-
├── Dockerfile
129-
├── docker-compose.yml
130-
├── requirements.txt
131-
├── requirements-dev.txt
132-
├── pytest.ini
133-
└── README.md
143+
144+
.github/\
145+
app/\
146+
model/\
147+
tests/\
148+
monitoring/\
149+
k8s/\
150+
load_tests/\
151+
scripts/\
152+
docs/
153+
154+
Dockerfile\
155+
docker-compose.yml\
156+
requirements.txt\
157+
requirements-dev.txt\
158+
pytest.ini\
159+
README.md
160+
161+
------------------------------------------------------------------------
134162

135163
## Evidence
136164

137-
Project verification screenshots are stored in:
165+
Verification screenshots are stored in
138166

139167
docs/evidence/
140168

141-
Evidence mapping:
169+
Evidence index
142170

143171
docs/evidence/evidence_index.md
144172

145-
## Cloud Deployment
173+
Evidence includes
174+
175+
- local API validation
176+
- Docker build and container runtime
177+
- Prometheus scraping targets
178+
- Grafana dashboard
179+
- Locust load testing
180+
- GitHub Actions CI success
181+
- GHCR container publishing
182+
- Kubernetes deployment and pods
183+
- Horizontal Pod Autoscaler behaviour
184+
- ingress routing
185+
- public Render deployment
186+
187+
------------------------------------------------------------------------
188+
189+
## Deployment Summary
190+
191+
This project has been verified across multiple environments.
192+
193+
Local - FastAPI application start verified - Swagger and prediction
194+
tested
195+
196+
Docker - image built successfully - container runtime verified
197+
198+
Docker Compose Observability Stack - Prometheus scraping confirmed -
199+
Grafana dashboard operational
200+
201+
CI/CD - GitHub Actions pipeline passed - Docker image pushed to GHCR
202+
203+
Kubernetes - deployment applied successfully - pods healthy - HPA
204+
configured - ingress routing functional
205+
206+
Cloud Deployment - Render deployment successful - model artifact
207+
generated during build - public endpoints verified
208+
209+
------------------------------------------------------------------------
210+
211+
## Limitations
212+
213+
This project demonstrates production-style engineering patterns but is
214+
not a hardened enterprise deployment.
215+
216+
Limitations include
217+
218+
- no authentication or API key protection
219+
- no rate limiting
220+
- no formal model registry
221+
- no secrets management workflow
222+
- no distributed tracing
223+
- no alerting rules configured
224+
- no centralized log aggregation
225+
- no managed Kubernetes cluster
226+
- no canary or blue/green release strategy
227+
- Render free-tier cold start behaviour
228+
229+
------------------------------------------------------------------------
230+
231+
## Future Improvements
232+
233+
Possible improvements
234+
235+
- API authentication
236+
- request throttling
237+
- model versioning and registry integration
238+
- structured prediction logging
239+
- alerting with Prometheus and Grafana
240+
- infrastructure-as-code provisioning
241+
- managed Kubernetes deployment
242+
- progressive deployment strategies
243+
244+
------------------------------------------------------------------------
245+
246+
## What This Project Demonstrates
146247

147-
The containerized ML inference API was deployed to Render as a public web service using the project’s GitHub repository and Dockerfile.
248+
This project demonstrates applied engineering capability in
148249

149-
Cloud deployment verified:
150-
- public URL reachable
151-
- `/docs` available
152-
- `/health/live` available
153-
- `/health/ready` available
154-
- `/predict` returning successful inference
250+
- ML artifact management
251+
- inference API design
252+
- automated testing
253+
- observability integration
254+
- containerization
255+
- CI/CD workflows
256+
- container registry publishing
257+
- Kubernetes orchestration
258+
- autoscaling
259+
- ingress routing
260+
- public cloud deployment
155261

156-
Note: the Render deployment uses a free web service instance for demonstration purposes.
262+
It represents an end-to-end machine learning inference service rather
263+
than a notebook-only model experiment.

0 commit comments

Comments
 (0)