|
| 1 | + |
1 | 2 | # Architecture Notes — ML Inference API |
2 | 3 |
|
3 | 4 | ## Purpose |
4 | 5 |
|
5 | | -This document provides a simple architecture reference separate from the README. |
| 6 | +This document describes the system architecture used to run the **ML Inference API** as a production‑style machine learning service. |
| 7 | +The architecture demonstrates how a trained ML model is packaged, exposed through an API, monitored, containerized, and deployed to cloud infrastructure. |
| 8 | + |
| 9 | +The focus is **engineering architecture**, not model research. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +# 1. Core Application Architecture |
| 14 | + |
| 15 | +The core service is a FastAPI application that loads a trained model artifact and exposes prediction endpoints. |
6 | 16 |
|
7 | | -It is useful because the README explains the project broadly, while this file isolates the system structure in one place. |
| 17 | +Flow: |
| 18 | + |
| 19 | +Client |
| 20 | +→ FastAPI API |
| 21 | +→ Model Artifact |
| 22 | +→ Prediction Response |
| 23 | + |
| 24 | +Details: |
| 25 | + |
| 26 | +• The ML model is trained offline using **scikit‑learn**. |
| 27 | +• The trained model is serialized using **joblib**. |
| 28 | +• The artifact is loaded at API startup. |
| 29 | +• Inference is executed inside the FastAPI service. |
| 30 | + |
| 31 | +Key properties: |
| 32 | + |
| 33 | +• low‑latency inference |
| 34 | +• stateless API container |
| 35 | +• container‑friendly runtime |
8 | 36 |
|
9 | 37 | --- |
10 | 38 |
|
11 | | -## 1. Application Architecture |
| 39 | +# 2. API Layer |
| 40 | + |
| 41 | +The API layer is implemented using **FastAPI**. |
12 | 42 |
|
13 | | -```text |
14 | | -Client |
15 | | - │ |
16 | | - ▼ |
17 | | -FastAPI Inference API |
18 | | - │ |
19 | | - ▼ |
20 | | -Serialized Model Artifact |
21 | | -``` |
| 43 | +Responsibilities: |
22 | 44 |
|
23 | | -The FastAPI service loads the serialized model artifact during startup and uses it to serve predictions. |
| 45 | +• request validation using Pydantic |
| 46 | +• prediction execution |
| 47 | +• health checks for infrastructure monitoring |
| 48 | +• metrics exposure for Prometheus |
| 49 | + |
| 50 | +Endpoints: |
| 51 | + |
| 52 | +GET /health/live |
| 53 | +GET /health/ready |
| 54 | +POST /predict |
| 55 | +GET /metrics |
| 56 | +GET /docs |
| 57 | + |
| 58 | +The `/docs` endpoint exposes the **Swagger UI** for interactive testing. |
24 | 59 |
|
25 | 60 | --- |
26 | 61 |
|
27 | | -## 2. Observability Architecture |
| 62 | +# 3. Monitoring Architecture |
| 63 | + |
| 64 | +Monitoring is implemented using **Prometheus and Grafana**. |
| 65 | + |
| 66 | +Flow: |
| 67 | + |
| 68 | +FastAPI Application |
| 69 | +→ /metrics endpoint |
| 70 | +→ Prometheus Scraper |
| 71 | +→ Grafana Dashboard |
| 72 | + |
| 73 | +Responsibilities: |
28 | 74 |
|
29 | | -```text |
30 | | -FastAPI Application |
31 | | - │ |
32 | | - ▼ |
33 | | -/metrics endpoint |
34 | | - │ |
35 | | - ▼ |
36 | 75 | Prometheus |
37 | | - │ |
38 | | - ▼ |
| 76 | + |
| 77 | +• scrape metrics from the application |
| 78 | +• store time‑series metrics |
| 79 | + |
39 | 80 | Grafana |
40 | | -``` |
41 | 81 |
|
42 | | -The application exposes Prometheus-format metrics. Prometheus scrapes those metrics, and Grafana visualizes them. |
| 82 | +• visualize metrics |
| 83 | +• create monitoring dashboards |
| 84 | + |
| 85 | +Metrics collected include: |
| 86 | + |
| 87 | +• request count |
| 88 | +• request latency |
| 89 | +• application health |
| 90 | +• inference calls |
43 | 91 |
|
44 | 92 | --- |
45 | 93 |
|
46 | | -## 3. Delivery Architecture |
| 94 | +# 4. Container Architecture |
| 95 | + |
| 96 | +The service runs inside a **Docker container**. |
47 | 97 |
|
48 | | -```text |
49 | | -GitHub Push |
50 | | - │ |
51 | | - ▼ |
52 | | -GitHub Actions |
53 | | - │ |
54 | | - ├── Run Tests |
55 | | - ├── Build Docker Image |
56 | | - └── Push Image to GHCR |
57 | | -``` |
| 98 | +Container responsibilities: |
58 | 99 |
|
59 | | -This supports continuous validation and distribution of the container image. |
| 100 | +• install dependencies |
| 101 | +• copy application code |
| 102 | +• include trained model artifact |
| 103 | +• start FastAPI server with Uvicorn |
| 104 | + |
| 105 | +Startup command: |
| 106 | + |
| 107 | +Uvicorn runs the FastAPI app and exposes port **8000**. |
| 108 | + |
| 109 | +Container advantages: |
| 110 | + |
| 111 | +• reproducible environment |
| 112 | +• portable deployment |
| 113 | +• consistent runtime across systems |
60 | 114 |
|
61 | 115 | --- |
62 | 116 |
|
63 | | -## 4. Kubernetes Architecture |
| 117 | +# 5. CI/CD Architecture |
| 118 | + |
| 119 | +Continuous integration is implemented using **GitHub Actions**. |
| 120 | + |
| 121 | +Flow: |
| 122 | + |
| 123 | +GitHub Push |
| 124 | +→ GitHub Actions Pipeline |
| 125 | +→ Run Tests |
| 126 | +→ Build Docker Image |
| 127 | +→ Push Image to Container Registry |
| 128 | + |
| 129 | +Container registries used: |
| 130 | + |
| 131 | +• GitHub Container Registry (GHCR) |
| 132 | +• Amazon Elastic Container Registry (ECR) |
| 133 | + |
| 134 | +Pipeline responsibilities: |
| 135 | + |
| 136 | +• run automated tests |
| 137 | +• validate build |
| 138 | +• publish container image |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +# 6. Kubernetes Architecture |
| 143 | + |
| 144 | +The application can run inside **Kubernetes**. |
| 145 | + |
| 146 | +Flow: |
| 147 | + |
| 148 | +Client |
| 149 | +→ Ingress Controller |
| 150 | +→ Kubernetes Service |
| 151 | +→ FastAPI Pods |
| 152 | + |
| 153 | +Components: |
| 154 | + |
| 155 | +Deployment |
| 156 | + |
| 157 | +• manages FastAPI pods |
| 158 | + |
| 159 | +Service |
| 160 | + |
| 161 | +• exposes pods internally |
| 162 | + |
| 163 | +Horizontal Pod Autoscaler (HPA) |
| 164 | + |
| 165 | +• scales pods based on CPU usage |
64 | 166 |
|
65 | | -```text |
66 | | -Client |
67 | | - │ |
68 | | - ▼ |
69 | 167 | Ingress |
70 | | - │ |
71 | | - ▼ |
72 | | -Kubernetes Service |
73 | | - │ |
74 | | - ▼ |
75 | | -FastAPI Pods |
76 | | - │ |
77 | | - ▼ |
78 | | -Model Artifact |
79 | | -``` |
80 | | - |
81 | | -The Kubernetes deployment adds service routing, autoscaling support, and ingress-based access. |
| 168 | + |
| 169 | +• routes external traffic to the service |
| 170 | + |
| 171 | +Benefits: |
| 172 | + |
| 173 | +• scalability |
| 174 | +• container orchestration |
| 175 | +• rolling deployments |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +# 7. AWS Deployment Architecture |
| 180 | + |
| 181 | +The system is deployed to **AWS ECS Fargate**. |
| 182 | + |
| 183 | +Infrastructure components: |
| 184 | + |
| 185 | +Amazon ECR |
| 186 | +ECS Cluster |
| 187 | +Task Definition |
| 188 | +ECS Service |
| 189 | +Application Load Balancer |
| 190 | +Target Group |
| 191 | +Security Groups |
| 192 | + |
| 193 | +Traffic flow: |
| 194 | + |
| 195 | +Client |
| 196 | +→ Application Load Balancer (ALB) |
| 197 | +→ Target Group |
| 198 | +→ ECS Service |
| 199 | +→ Fargate Task |
| 200 | +→ FastAPI Container |
| 201 | +→ Model Artifact |
| 202 | + |
| 203 | +--- |
| 204 | + |
| 205 | +# 8. Network Security Model |
| 206 | + |
| 207 | +Two security groups control traffic. |
| 208 | + |
| 209 | +ALB Security Group |
| 210 | + |
| 211 | +Inbound: |
| 212 | + |
| 213 | +HTTP 80 from internet |
| 214 | + |
| 215 | +Outbound: |
| 216 | + |
| 217 | +All traffic allowed |
| 218 | + |
| 219 | +Task Security Group |
| 220 | + |
| 221 | +Inbound: |
| 222 | + |
| 223 | +TCP 8000 from ALB security group only |
| 224 | + |
| 225 | +Outbound: |
| 226 | + |
| 227 | +All traffic allowed |
| 228 | + |
| 229 | +This prevents direct public access to the container tasks. |
| 230 | + |
| 231 | +--- |
| 232 | + |
| 233 | +# 9. Health Check Architecture |
| 234 | + |
| 235 | +The load balancer monitors container health. |
| 236 | + |
| 237 | +Target Group health check: |
| 238 | + |
| 239 | +Protocol: HTTP |
| 240 | +Port: 8000 |
| 241 | +Path: /health/live |
| 242 | +Success code: 200 |
| 243 | + |
| 244 | +If the container fails health checks: |
| 245 | + |
| 246 | +• the task is considered unhealthy |
| 247 | +• ECS replaces the task automatically |
82 | 248 |
|
83 | 249 | --- |
84 | 250 |
|
85 | | -## 5. Public Cloud Deployment Architecture |
| 251 | +# 10. Request Flow in Production |
86 | 252 |
|
87 | | -```text |
88 | | -GitHub Repository |
89 | | - │ |
90 | | - ▼ |
91 | | -Render Build |
92 | | - │ |
93 | | - ▼ |
94 | | -Docker Container |
95 | | - │ |
96 | | - ▼ |
97 | | -Public Web Service |
98 | | -``` |
| 253 | +Complete request path: |
99 | 254 |
|
100 | | -This gives the project a publicly reachable deployment suitable for demonstration. |
| 255 | +User Request |
| 256 | +→ Internet |
| 257 | +→ Application Load Balancer |
| 258 | +→ Target Group |
| 259 | +→ ECS Service |
| 260 | +→ Fargate Task |
| 261 | +→ FastAPI Container |
| 262 | +→ ML Model |
| 263 | +→ Prediction Response |
101 | 264 |
|
102 | 265 | --- |
103 | 266 |
|
104 | | -## 6. Final Interpretation |
| 267 | +# 11. System Characteristics |
| 268 | + |
| 269 | +The architecture demonstrates: |
| 270 | + |
| 271 | +• stateless API containers |
| 272 | +• containerized ML inference |
| 273 | +• infrastructure health monitoring |
| 274 | +• container registry deployment |
| 275 | +• load balanced cloud services |
| 276 | + |
| 277 | +This structure mirrors the architecture used by many real production ML systems. |
| 278 | + |
| 279 | +--- |
| 280 | + |
| 281 | +# 12. Known Limitations |
| 282 | + |
| 283 | +This architecture is intentionally simplified for demonstration purposes. |
| 284 | + |
| 285 | +Limitations: |
| 286 | + |
| 287 | +• no authentication layer |
| 288 | +• no rate limiting |
| 289 | +• no model registry |
| 290 | +• no distributed tracing |
| 291 | +• no centralized logging stack |
| 292 | +• no Infrastructure‑as‑Code provisioning |
| 293 | + |
| 294 | +--- |
105 | 295 |
|
106 | | -The project architecture is best understood as a layered ML service system: |
| 296 | +# 13. Possible Improvements |
107 | 297 |
|
108 | | -- model training and artifact generation |
109 | | -- API serving |
110 | | -- monitoring |
111 | | -- testing |
112 | | -- containerization |
113 | | -- CI/CD |
114 | | -- orchestration |
115 | | -- public deployment |
| 298 | +Future engineering improvements could include: |
116 | 299 |
|
117 | | -This folder can later be expanded with diagrams if needed, but a simple text architecture reference is already useful and sufficient. |
| 300 | +• Terraform infrastructure provisioning |
| 301 | +• API authentication |
| 302 | +• request throttling |
| 303 | +• centralized logging (ELK or OpenSearch) |
| 304 | +• distributed tracing (OpenTelemetry) |
| 305 | +• model version registry |
| 306 | +• ECS autoscaling policies |
0 commit comments