Skip to content

Commit 2f24e27

Browse files
committed
Rewrite architecture notes with complete system architecture
1 parent e35ab99 commit 2f24e27

File tree

1 file changed

+268
-79
lines changed

1 file changed

+268
-79
lines changed
Lines changed: 268 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,117 +1,306 @@
1+
12
# Architecture Notes — ML Inference API
23

34
## Purpose
45

5-
This document provides a simple architecture reference separate from the README.
6+
This document describes the system architecture used to run the **ML Inference API** as a production‑style machine learning service.
7+
The architecture demonstrates how a trained ML model is packaged, exposed through an API, monitored, containerized, and deployed to cloud infrastructure.
8+
9+
The focus is **engineering architecture**, not model research.
10+
11+
---
12+
13+
# 1. Core Application Architecture
14+
15+
The core service is a FastAPI application that loads a trained model artifact and exposes prediction endpoints.
616

7-
It is useful because the README explains the project broadly, while this file isolates the system structure in one place.
17+
Flow:
18+
19+
Client
20+
→ FastAPI API
21+
→ Model Artifact
22+
→ Prediction Response
23+
24+
Details:
25+
26+
• The ML model is trained offline using **scikit‑learn**.
27+
• The trained model is serialized using **joblib**.
28+
• The artifact is loaded at API startup.
29+
• Inference is executed inside the FastAPI service.
30+
31+
Key properties:
32+
33+
• low‑latency inference
34+
• stateless API container
35+
• container‑friendly runtime
836

937
---
1038

11-
## 1. Application Architecture
39+
# 2. API Layer
40+
41+
The API layer is implemented using **FastAPI**.
1242

13-
```text
14-
Client
15-
16-
17-
FastAPI Inference API
18-
19-
20-
Serialized Model Artifact
21-
```
43+
Responsibilities:
2244

23-
The FastAPI service loads the serialized model artifact during startup and uses it to serve predictions.
45+
• request validation using Pydantic
46+
• prediction execution
47+
• health checks for infrastructure monitoring
48+
• metrics exposure for Prometheus
49+
50+
Endpoints:
51+
52+
GET /health/live
53+
GET /health/ready
54+
POST /predict
55+
GET /metrics
56+
GET /docs
57+
58+
The `/docs` endpoint exposes the **Swagger UI** for interactive testing.
2459

2560
---
2661

27-
## 2. Observability Architecture
62+
# 3. Monitoring Architecture
63+
64+
Monitoring is implemented using **Prometheus and Grafana**.
65+
66+
Flow:
67+
68+
FastAPI Application
69+
→ /metrics endpoint
70+
→ Prometheus Scraper
71+
→ Grafana Dashboard
72+
73+
Responsibilities:
2874

29-
```text
30-
FastAPI Application
31-
32-
33-
/metrics endpoint
34-
35-
3675
Prometheus
37-
38-
76+
77+
• scrape metrics from the application
78+
• store time‑series metrics
79+
3980
Grafana
40-
```
4181

42-
The application exposes Prometheus-format metrics. Prometheus scrapes those metrics, and Grafana visualizes them.
82+
• visualize metrics
83+
• create monitoring dashboards
84+
85+
Metrics collected include:
86+
87+
• request count
88+
• request latency
89+
• application health
90+
• inference calls
4391

4492
---
4593

46-
## 3. Delivery Architecture
94+
# 4. Container Architecture
95+
96+
The service runs inside a **Docker container**.
4797

48-
```text
49-
GitHub Push
50-
51-
52-
GitHub Actions
53-
54-
├── Run Tests
55-
├── Build Docker Image
56-
└── Push Image to GHCR
57-
```
98+
Container responsibilities:
5899

59-
This supports continuous validation and distribution of the container image.
100+
• install dependencies
101+
• copy application code
102+
• include trained model artifact
103+
• start FastAPI server with Uvicorn
104+
105+
Startup command:
106+
107+
Uvicorn runs the FastAPI app and exposes port **8000**.
108+
109+
Container advantages:
110+
111+
• reproducible environment
112+
• portable deployment
113+
• consistent runtime across systems
60114

61115
---
62116

63-
## 4. Kubernetes Architecture
117+
# 5. CI/CD Architecture
118+
119+
Continuous integration is implemented using **GitHub Actions**.
120+
121+
Flow:
122+
123+
GitHub Push
124+
→ GitHub Actions Pipeline
125+
→ Run Tests
126+
→ Build Docker Image
127+
→ Push Image to Container Registry
128+
129+
Container registries used:
130+
131+
• GitHub Container Registry (GHCR)
132+
• Amazon Elastic Container Registry (ECR)
133+
134+
Pipeline responsibilities:
135+
136+
• run automated tests
137+
• validate build
138+
• publish container image
139+
140+
---
141+
142+
# 6. Kubernetes Architecture
143+
144+
The application can run inside **Kubernetes**.
145+
146+
Flow:
147+
148+
Client
149+
→ Ingress Controller
150+
→ Kubernetes Service
151+
→ FastAPI Pods
152+
153+
Components:
154+
155+
Deployment
156+
157+
• manages FastAPI pods
158+
159+
Service
160+
161+
• exposes pods internally
162+
163+
Horizontal Pod Autoscaler (HPA)
164+
165+
• scales pods based on CPU usage
64166

65-
```text
66-
Client
67-
68-
69167
Ingress
70-
71-
72-
Kubernetes Service
73-
74-
75-
FastAPI Pods
76-
77-
78-
Model Artifact
79-
```
80-
81-
The Kubernetes deployment adds service routing, autoscaling support, and ingress-based access.
168+
169+
• routes external traffic to the service
170+
171+
Benefits:
172+
173+
• scalability
174+
• container orchestration
175+
• rolling deployments
176+
177+
---
178+
179+
# 7. AWS Deployment Architecture
180+
181+
The system is deployed to **AWS ECS Fargate**.
182+
183+
Infrastructure components:
184+
185+
Amazon ECR
186+
ECS Cluster
187+
Task Definition
188+
ECS Service
189+
Application Load Balancer
190+
Target Group
191+
Security Groups
192+
193+
Traffic flow:
194+
195+
Client
196+
→ Application Load Balancer (ALB)
197+
→ Target Group
198+
→ ECS Service
199+
→ Fargate Task
200+
→ FastAPI Container
201+
→ Model Artifact
202+
203+
---
204+
205+
# 8. Network Security Model
206+
207+
Two security groups control traffic.
208+
209+
ALB Security Group
210+
211+
Inbound:
212+
213+
HTTP 80 from internet
214+
215+
Outbound:
216+
217+
All traffic allowed
218+
219+
Task Security Group
220+
221+
Inbound:
222+
223+
TCP 8000 from ALB security group only
224+
225+
Outbound:
226+
227+
All traffic allowed
228+
229+
This prevents direct public access to the container tasks.
230+
231+
---
232+
233+
# 9. Health Check Architecture
234+
235+
The load balancer monitors container health.
236+
237+
Target Group health check:
238+
239+
Protocol: HTTP
240+
Port: 8000
241+
Path: /health/live
242+
Success code: 200
243+
244+
If the container fails health checks:
245+
246+
• the task is considered unhealthy
247+
• ECS replaces the task automatically
82248

83249
---
84250

85-
## 5. Public Cloud Deployment Architecture
251+
# 10. Request Flow in Production
86252

87-
```text
88-
GitHub Repository
89-
90-
91-
Render Build
92-
93-
94-
Docker Container
95-
96-
97-
Public Web Service
98-
```
253+
Complete request path:
99254

100-
This gives the project a publicly reachable deployment suitable for demonstration.
255+
User Request
256+
→ Internet
257+
→ Application Load Balancer
258+
→ Target Group
259+
→ ECS Service
260+
→ Fargate Task
261+
→ FastAPI Container
262+
→ ML Model
263+
→ Prediction Response
101264

102265
---
103266

104-
## 6. Final Interpretation
267+
# 11. System Characteristics
268+
269+
The architecture demonstrates:
270+
271+
• stateless API containers
272+
• containerized ML inference
273+
• infrastructure health monitoring
274+
• container registry deployment
275+
• load balanced cloud services
276+
277+
This structure mirrors the architecture used by many real production ML systems.
278+
279+
---
280+
281+
# 12. Known Limitations
282+
283+
This architecture is intentionally simplified for demonstration purposes.
284+
285+
Limitations:
286+
287+
• no authentication layer
288+
• no rate limiting
289+
• no model registry
290+
• no distributed tracing
291+
• no centralized logging stack
292+
• no Infrastructure‑as‑Code provisioning
293+
294+
---
105295

106-
The project architecture is best understood as a layered ML service system:
296+
# 13. Possible Improvements
107297

108-
- model training and artifact generation
109-
- API serving
110-
- monitoring
111-
- testing
112-
- containerization
113-
- CI/CD
114-
- orchestration
115-
- public deployment
298+
Future engineering improvements could include:
116299

117-
This folder can later be expanded with diagrams if needed, but a simple text architecture reference is already useful and sufficient.
300+
• Terraform infrastructure provisioning
301+
• API authentication
302+
• request throttling
303+
• centralized logging (ELK or OpenSearch)
304+
• distributed tracing (OpenTelemetry)
305+
• model version registry
306+
• ECS autoscaling policies

0 commit comments

Comments
 (0)