|
1 | | -# Production-Style ML Inference API with FastAPI, Docker, Prometheus, Grafana, GitHub Actions, Kubernetes, HPA, and Ingress |
2 | | - |
3 | | -## Project Summary |
4 | | - |
5 | | -This project demonstrates how to build a production-style machine learning inference service from the ground up. It starts with offline model training and artifact serialization, then exposes the model through a FastAPI application with health checks, Prometheus metrics, automated tests, containerization, CI/CD, Kubernetes deployment, horizontal pod autoscaling, and ingress-based routing. |
6 | | - |
7 | | -The goal of this project is not just to train a model, but to show the engineering practices required to operate a machine learning model as a service. |
8 | | - |
9 | | -## What This Project Includes |
10 | | - |
11 | | -- offline model training and artifact saving |
12 | | -- FastAPI inference API |
13 | | -- request and response validation with Pydantic |
14 | | -- liveness and readiness endpoints |
15 | | -- Prometheus metrics exposure |
16 | | -- automated testing with pytest |
17 | | -- Docker containerization |
18 | | -- Docker Compose observability stack with Prometheus and Grafana |
19 | | -- load testing with Locust |
20 | | -- GitHub Actions CI/CD |
21 | | -- Docker image publishing to GitHub Container Registry (GHCR) |
22 | | -- Kubernetes deployment and service routing |
23 | | -- resource requests and limits |
24 | | -- Horizontal Pod Autoscaler (HPA) |
25 | | -- ingress-based access using ingress-nginx |
26 | | - |
27 | | -## Tech Stack |
28 | | - |
29 | | -- Python |
30 | | -- FastAPI |
31 | | -- scikit-learn |
32 | | -- NumPy |
33 | | -- joblib |
34 | | -- pytest |
35 | | -- Docker |
36 | | -- Docker Compose |
37 | | -- Prometheus |
38 | | -- Grafana |
39 | | -- Locust |
40 | | -- GitHub Actions |
41 | | -- GitHub Container Registry (GHCR) |
42 | | -- Kubernetes |
43 | | -- ingress-nginx |
| 1 | +# ML Inference API --- Production-Style ML Inference Service |
44 | 2 |
|
45 | | -## Architecture |
| 3 | +## Overview |
46 | 4 |
|
47 | | -### Local and container architecture |
| 5 | +This project demonstrates how to take a trained machine learning model |
| 6 | +beyond notebook experimentation and operate it as a service. |
48 | 7 |
|
49 | | -Client |
50 | | -→ FastAPI API |
51 | | -→ Model artifact |
52 | | -→ `/metrics` |
53 | | -→ Prometheus |
54 | | -→ Grafana |
| 8 | +The system trains a model offline, serializes the artifact, exposes it |
| 9 | +through a FastAPI inference API, validates requests, emits Prometheus |
| 10 | +metrics, runs automated tests, packages the application with Docker, |
| 11 | +supports observability with Prometheus and Grafana, performs load |
| 12 | +testing with Locust, publishes container images to GitHub Container |
| 13 | +Registry (GHCR), and deploys the service through Kubernetes, ingress, |
| 14 | +and a public cloud web service. |
55 | 15 |
|
56 | | -### Kubernetes architecture |
| 16 | +This is a production-style project focused on engineering practice |
| 17 | +rather than only model training. |
57 | 18 |
|
58 | | -Client |
59 | | -→ Ingress |
60 | | -→ Kubernetes Service |
61 | | -→ FastAPI Pods |
62 | | -→ Model artifact |
| 19 | +------------------------------------------------------------------------ |
63 | 20 |
|
64 | | -### Delivery architecture |
| 21 | +## Live Deployment |
65 | 22 |
|
66 | | -GitHub Push |
67 | | -→ GitHub Actions |
68 | | -→ Run Tests |
69 | | -→ Build Docker Image |
70 | | -→ Push to GHCR |
71 | | -→ Kubernetes pulls image |
| 23 | +Public service https://ml-inference-api-tagq.onrender.com |
72 | 24 |
|
73 | | -## API Endpoints |
| 25 | +Swagger interface https://ml-inference-api-tagq.onrender.com/docs |
| 26 | + |
| 27 | +Health endpoints GET /health/live GET /health/ready |
| 28 | + |
| 29 | +Metrics GET /metrics |
| 30 | + |
| 31 | +Prediction endpoint POST /predict |
| 32 | + |
| 33 | +------------------------------------------------------------------------ |
| 34 | + |
| 35 | +## Project Objective |
| 36 | + |
| 37 | +Most ML tutorials stop after training a model. Real systems require |
| 38 | +additional engineering layers including: - repeatable model packaging - |
| 39 | +API design and validation - automated testing - observability - |
| 40 | +containerization - deployment workflows - infrastructure exposure |
| 41 | + |
| 42 | +This project demonstrates the path from: Trained model → API service → |
| 43 | +container → monitored deployment |
| 44 | + |
| 45 | +------------------------------------------------------------------------ |
| 46 | + |
| 47 | +## Core Features |
| 48 | + |
| 49 | +Machine Learning - offline model training with scikit-learn - serialized |
| 50 | +model artifact using joblib - reproducible artifact generation during |
| 51 | +Docker build |
| 52 | + |
| 53 | +API - FastAPI inference service - request and response validation with |
| 54 | +Pydantic - health endpoints for liveness and readiness - automatic |
| 55 | +Swagger documentation |
| 56 | + |
| 57 | +Testing - automated API tests using pytest - validation of prediction, |
| 58 | +health checks, and invalid payloads |
| 59 | + |
| 60 | +Observability - Prometheus metrics exposure - Prometheus target |
| 61 | +validation - Grafana dashboard visualization - load testing with Locust |
| 62 | + |
| 63 | +Containerization - Docker image build - container runtime validation - |
| 64 | +Docker Compose observability stack |
| 65 | + |
| 66 | +Delivery and Registry - GitHub Actions CI pipeline - Docker image |
| 67 | +publishing to GHCR - remote container pull verification |
| 68 | + |
| 69 | +Orchestration - Kubernetes deployment and service - resource requests |
| 70 | +and limits - rolling deployment strategy - Horizontal Pod Autoscaler |
| 71 | +(HPA) - ingress-nginx controller and ingress routing |
| 72 | + |
| 73 | +Cloud Deployment - public Docker deployment on Render - successful |
| 74 | +application startup in managed environment - public API access |
| 75 | + |
| 76 | +------------------------------------------------------------------------ |
| 77 | + |
| 78 | +## Technology Stack |
| 79 | + |
| 80 | +Programming and ML - Python - scikit-learn - NumPy - joblib |
| 81 | + |
| 82 | +API - FastAPI - Pydantic - Uvicorn |
| 83 | + |
| 84 | +Testing - pytest |
| 85 | + |
| 86 | +Containerization - Docker - Docker Compose |
| 87 | + |
| 88 | +Observability - Prometheus - Grafana - Locust - |
| 89 | +prometheus-fastapi-instrumentator |
| 90 | + |
| 91 | +CI/CD and Registry - GitHub Actions - GitHub Container Registry (GHCR) |
| 92 | + |
| 93 | +Infrastructure - Kubernetes - ingress-nginx - Render |
74 | 94 |
|
75 | | -### `GET /health/live` |
76 | | -Returns application liveness status. |
| 95 | +------------------------------------------------------------------------ |
77 | 96 |
|
78 | | -Example response: |
| 97 | +## Architecture |
| 98 | + |
| 99 | +Local and container flow Client → FastAPI API → Model Artifact → Metrics |
| 100 | +→ Prometheus → Grafana |
79 | 101 |
|
80 | | -```json |
81 | | -{"status":"alive"} |
| 102 | +Delivery pipeline GitHub Push → GitHub Actions → Run Tests → Build |
| 103 | +Docker Image → Push to GHCR → Deployment platform pulls image |
82 | 104 |
|
| 105 | +Kubernetes flow Client → Ingress → Kubernetes Service → FastAPI Pods → |
| 106 | +Model Artifact |
83 | 107 |
|
84 | | -GET /health/ready |
| 108 | +Public deployment flow GitHub Repository → Docker Build → Model Artifact |
| 109 | +Generated → Public Web Service |
85 | 110 |
|
86 | | -Returns readiness status after model loading. |
| 111 | +------------------------------------------------------------------------ |
| 112 | + |
| 113 | +## API Endpoints |
87 | 114 |
|
88 | | -Example response: |
| 115 | +GET /health/live\ |
| 116 | +Returns liveness status. |
89 | 117 |
|
90 | | -{"status":"ready"} |
91 | | -POST /predict |
| 118 | +Example { "status": "alive" } |
92 | 119 |
|
| 120 | +GET /health/ready\ |
| 121 | +Returns readiness status after model loads. |
| 122 | + |
| 123 | +Example { "status": "ready" } |
| 124 | + |
| 125 | +POST /predict\ |
93 | 126 | Runs inference using the trained model. |
94 | 127 |
|
95 | | -Example request: |
| 128 | +Example request { "features": \[5.1, 3.5, 1.4, 0.2\] } |
96 | 129 |
|
97 | | -{ |
98 | | - "features": [5.1, 3.5, 1.4, 0.2] |
99 | | -} |
| 130 | +Example response { "prediction": 0 } |
100 | 131 |
|
101 | | -Example response: |
| 132 | +GET /metrics\ |
| 133 | +Prometheus metrics endpoint. |
102 | 134 |
|
103 | | -{ |
104 | | - "prediction": 0 |
105 | | -} |
| 135 | +GET /docs\ |
| 136 | +Swagger UI interface. |
106 | 137 |
|
107 | | -GET /health/live |
108 | | -GET /health/ready |
109 | | -POST /predict |
110 | | -GET /metrics |
111 | | -GET /docs |
| 138 | +------------------------------------------------------------------------ |
112 | 139 |
|
113 | 140 | ## Project Structure |
114 | 141 |
|
115 | 142 | ml_inference_api/ |
116 | | -├── .github/ |
117 | | -├── app/ |
118 | | -├── model/ |
119 | | -├── tests/ |
120 | | -├── monitoring/ |
121 | | -├── k8s/ |
122 | | -├── load_tests/ |
123 | | -├── scripts/ |
124 | | -├── docs/ |
125 | | -│ ├── evidence/ |
126 | | -│ ├── reports/ |
127 | | -│ └── architecture/ |
128 | | -├── Dockerfile |
129 | | -├── docker-compose.yml |
130 | | -├── requirements.txt |
131 | | -├── requirements-dev.txt |
132 | | -├── pytest.ini |
133 | | -└── README.md |
| 143 | + |
| 144 | +.github/\ |
| 145 | +app/\ |
| 146 | +model/\ |
| 147 | +tests/\ |
| 148 | +monitoring/\ |
| 149 | +k8s/\ |
| 150 | +load_tests/\ |
| 151 | +scripts/\ |
| 152 | +docs/ |
| 153 | + |
| 154 | +Dockerfile\ |
| 155 | +docker-compose.yml\ |
| 156 | +requirements.txt\ |
| 157 | +requirements-dev.txt\ |
| 158 | +pytest.ini\ |
| 159 | +README.md |
| 160 | + |
| 161 | +------------------------------------------------------------------------ |
134 | 162 |
|
135 | 163 | ## Evidence |
136 | 164 |
|
137 | | -Project verification screenshots are stored in: |
| 165 | +Verification screenshots are stored in |
138 | 166 |
|
139 | 167 | docs/evidence/ |
140 | 168 |
|
141 | | -Evidence mapping: |
| 169 | +Evidence index |
142 | 170 |
|
143 | 171 | docs/evidence/evidence_index.md |
144 | 172 |
|
145 | | -## Cloud Deployment |
| 173 | +Evidence includes |
| 174 | + |
| 175 | +- local API validation |
| 176 | +- Docker build and container runtime |
| 177 | +- Prometheus scraping targets |
| 178 | +- Grafana dashboard |
| 179 | +- Locust load testing |
| 180 | +- GitHub Actions CI success |
| 181 | +- GHCR container publishing |
| 182 | +- Kubernetes deployment and pods |
| 183 | +- Horizontal Pod Autoscaler behaviour |
| 184 | +- ingress routing |
| 185 | +- public Render deployment |
| 186 | + |
| 187 | +------------------------------------------------------------------------ |
| 188 | + |
| 189 | +## Deployment Summary |
| 190 | + |
| 191 | +This project has been verified across multiple environments. |
| 192 | + |
| 193 | +Local - FastAPI application start verified - Swagger and prediction |
| 194 | +tested |
| 195 | + |
| 196 | +Docker - image built successfully - container runtime verified |
| 197 | + |
| 198 | +Docker Compose Observability Stack - Prometheus scraping confirmed - |
| 199 | +Grafana dashboard operational |
| 200 | + |
| 201 | +CI/CD - GitHub Actions pipeline passed - Docker image pushed to GHCR |
| 202 | + |
| 203 | +Kubernetes - deployment applied successfully - pods healthy - HPA |
| 204 | +configured - ingress routing functional |
| 205 | + |
| 206 | +Cloud Deployment - Render deployment successful - model artifact |
| 207 | +generated during build - public endpoints verified |
| 208 | + |
| 209 | +------------------------------------------------------------------------ |
| 210 | + |
| 211 | +## Limitations |
| 212 | + |
| 213 | +This project demonstrates production-style engineering patterns but is |
| 214 | +not a hardened enterprise deployment. |
| 215 | + |
| 216 | +Limitations include |
| 217 | + |
| 218 | +- no authentication or API key protection |
| 219 | +- no rate limiting |
| 220 | +- no formal model registry |
| 221 | +- no secrets management workflow |
| 222 | +- no distributed tracing |
| 223 | +- no alerting rules configured |
| 224 | +- no centralized log aggregation |
| 225 | +- no managed Kubernetes cluster |
| 226 | +- no canary or blue/green release strategy |
| 227 | +- Render free-tier cold start behaviour |
| 228 | + |
| 229 | +------------------------------------------------------------------------ |
| 230 | + |
| 231 | +## Future Improvements |
| 232 | + |
| 233 | +Possible improvements |
| 234 | + |
| 235 | +- API authentication |
| 236 | +- request throttling |
| 237 | +- model versioning and registry integration |
| 238 | +- structured prediction logging |
| 239 | +- alerting with Prometheus and Grafana |
| 240 | +- infrastructure-as-code provisioning |
| 241 | +- managed Kubernetes deployment |
| 242 | +- progressive deployment strategies |
| 243 | + |
| 244 | +------------------------------------------------------------------------ |
| 245 | + |
| 246 | +## What This Project Demonstrates |
146 | 247 |
|
147 | | -The containerized ML inference API was deployed to Render as a public web service using the project’s GitHub repository and Dockerfile. |
| 248 | +This project demonstrates applied engineering capability in |
148 | 249 |
|
149 | | -Cloud deployment verified: |
150 | | -- public URL reachable |
151 | | -- `/docs` available |
152 | | -- `/health/live` available |
153 | | -- `/health/ready` available |
154 | | -- `/predict` returning successful inference |
| 250 | +- ML artifact management |
| 251 | +- inference API design |
| 252 | +- automated testing |
| 253 | +- observability integration |
| 254 | +- containerization |
| 255 | +- CI/CD workflows |
| 256 | +- container registry publishing |
| 257 | +- Kubernetes orchestration |
| 258 | +- autoscaling |
| 259 | +- ingress routing |
| 260 | +- public cloud deployment |
155 | 261 |
|
156 | | -Note: the Render deployment uses a free web service instance for demonstration purposes. |
| 262 | +It represents an end-to-end machine learning inference service rather |
| 263 | +than a notebook-only model experiment. |
0 commit comments