Rewrite README with full architecture, deployment and limitations

gbadedata · gbadedata · commit b652e5bdde34 · 2026-03-11T09:20:44.000Z
diff --git a/README.md b/README.md
@@ -1,156 +1,263 @@
-# Production-Style ML Inference API with FastAPI, Docker, Prometheus, Grafana, GitHub Actions, Kubernetes, HPA, and Ingress
-
-## Project Summary
-
-This project demonstrates how to build a production-style machine learning inference service from the ground up. It starts with offline model training and artifact serialization, then exposes the model through a FastAPI application with health checks, Prometheus metrics, automated tests, containerization, CI/CD, Kubernetes deployment, horizontal pod autoscaling, and ingress-based routing.
-
-The goal of this project is not just to train a model, but to show the engineering practices required to operate a machine learning model as a service.
-
-## What This Project Includes
-
-- offline model training and artifact saving
-- FastAPI inference API
-- request and response validation with Pydantic
-- liveness and readiness endpoints
-- Prometheus metrics exposure
-- automated testing with pytest
-- Docker containerization
-- Docker Compose observability stack with Prometheus and Grafana
-- load testing with Locust
-- GitHub Actions CI/CD
-- Docker image publishing to GitHub Container Registry (GHCR)
-- Kubernetes deployment and service routing
-- resource requests and limits
-- Horizontal Pod Autoscaler (HPA)
-- ingress-based access using ingress-nginx
-
-## Tech Stack
-
-- Python
-- FastAPI
-- scikit-learn
-- NumPy
-- joblib
-- pytest
-- Docker
-- Docker Compose
-- Prometheus
-- Grafana
-- Locust
-- GitHub Actions
-- GitHub Container Registry (GHCR)
-- Kubernetes
-- ingress-nginx
+# ML Inference API --- Production-Style ML Inference Service
 
-## Architecture
+## Overview
 
-### Local and container architecture
+This project demonstrates how to take a trained machine learning model
+beyond notebook experimentation and operate it as a service.
 
-Client  
-→ FastAPI API  
-→ Model artifact  
-→ `/metrics`  
-→ Prometheus  
-→ Grafana  
+The system trains a model offline, serializes the artifact, exposes it
+through a FastAPI inference API, validates requests, emits Prometheus
+metrics, runs automated tests, packages the application with Docker,
+supports observability with Prometheus and Grafana, performs load
+testing with Locust, publishes container images to GitHub Container
+Registry (GHCR), and deploys the service through Kubernetes, ingress,
+and a public cloud web service.
 
-### Kubernetes architecture
+This is a production-style project focused on engineering practice
+rather than only model training.
 
-Client  
-→ Ingress  
-→ Kubernetes Service  
-→ FastAPI Pods  
-→ Model artifact  
+------------------------------------------------------------------------
 
-### Delivery architecture
+## Live Deployment
 
-GitHub Push  
-→ GitHub Actions  
-→ Run Tests  
-→ Build Docker Image  
-→ Push to GHCR  
-→ Kubernetes pulls image  
+Public service https://ml-inference-api-tagq.onrender.com
 
-## API Endpoints
+Swagger interface https://ml-inference-api-tagq.onrender.com/docs
+
+Health endpoints GET /health/live GET /health/ready
+
+Metrics GET /metrics
+
+Prediction endpoint POST /predict
+
+------------------------------------------------------------------------
+
+## Project Objective
+
+Most ML tutorials stop after training a model. Real systems require
+additional engineering layers including: - repeatable model packaging -
+API design and validation - automated testing - observability -
+containerization - deployment workflows - infrastructure exposure
+
+This project demonstrates the path from: Trained model → API service →
+container → monitored deployment
+
+------------------------------------------------------------------------
+
+## Core Features
+
+Machine Learning - offline model training with scikit-learn - serialized
+model artifact using joblib - reproducible artifact generation during
+Docker build
+
+API - FastAPI inference service - request and response validation with
+Pydantic - health endpoints for liveness and readiness - automatic
+Swagger documentation
+
+Testing - automated API tests using pytest - validation of prediction,
+health checks, and invalid payloads
+
+Observability - Prometheus metrics exposure - Prometheus target
+validation - Grafana dashboard visualization - load testing with Locust
+
+Containerization - Docker image build - container runtime validation -
+Docker Compose observability stack
+
+Delivery and Registry - GitHub Actions CI pipeline - Docker image
+publishing to GHCR - remote container pull verification
+
+Orchestration - Kubernetes deployment and service - resource requests
+and limits - rolling deployment strategy - Horizontal Pod Autoscaler
+(HPA) - ingress-nginx controller and ingress routing
+
+Cloud Deployment - public Docker deployment on Render - successful
+application startup in managed environment - public API access
+
+------------------------------------------------------------------------
+
+## Technology Stack
+
+Programming and ML - Python - scikit-learn - NumPy - joblib
+
+API - FastAPI - Pydantic - Uvicorn
+
+Testing - pytest
+
+Containerization - Docker - Docker Compose
+
+Observability - Prometheus - Grafana - Locust -
+prometheus-fastapi-instrumentator
+
+CI/CD and Registry - GitHub Actions - GitHub Container Registry (GHCR)
+
+Infrastructure - Kubernetes - ingress-nginx - Render
 
-### `GET /health/live`
-Returns application liveness status.
+------------------------------------------------------------------------
 
-Example response:
+## Architecture
+
+Local and container flow Client → FastAPI API → Model Artifact → Metrics
+→ Prometheus → Grafana
 
-```json
-{"status":"alive"}
+Delivery pipeline GitHub Push → GitHub Actions → Run Tests → Build
+Docker Image → Push to GHCR → Deployment platform pulls image
 
+Kubernetes flow Client → Ingress → Kubernetes Service → FastAPI Pods →
+Model Artifact
 
-GET /health/ready
+Public deployment flow GitHub Repository → Docker Build → Model Artifact
+Generated → Public Web Service
 
-Returns readiness status after model loading.
+------------------------------------------------------------------------
+
+## API Endpoints
 
-Example response:
+GET /health/live\
+Returns liveness status.
 
-{"status":"ready"}
-POST /predict
+Example { "status": "alive" }
 
+GET /health/ready\
+Returns readiness status after model loads.
+
+Example { "status": "ready" }
+
+POST /predict\
 Runs inference using the trained model.
 
-Example request:
+Example request { "features": \[5.1, 3.5, 1.4, 0.2\] }
 
-{
-  "features": [5.1, 3.5, 1.4, 0.2]
-}
+Example response { "prediction": 0 }
 
-Example response:
+GET /metrics\
+Prometheus metrics endpoint.
 
-{
-  "prediction": 0
-}
+GET /docs\
+Swagger UI interface.
 
-GET /health/live  
-GET /health/ready  
-POST /predict  
-GET /metrics  
-GET /docs
+------------------------------------------------------------------------
 
 ## Project Structure
 
 ml_inference_api/
-├── .github/
-├── app/
-├── model/
-├── tests/
-├── monitoring/
-├── k8s/
-├── load_tests/
-├── scripts/
-├── docs/
-│ ├── evidence/
-│ ├── reports/
-│ └── architecture/
-├── Dockerfile
-├── docker-compose.yml
-├── requirements.txt
-├── requirements-dev.txt
-├── pytest.ini
-└── README.md
+
+.github/\
+app/\
+model/\
+tests/\
+monitoring/\
+k8s/\
+load_tests/\
+scripts/\
+docs/
+
+Dockerfile\
+docker-compose.yml\
+requirements.txt\
+requirements-dev.txt\
+pytest.ini\
+README.md
+
+------------------------------------------------------------------------
 
 ## Evidence
 
-Project verification screenshots are stored in:
+Verification screenshots are stored in
 
 docs/evidence/
 
-Evidence mapping:
+Evidence index
 
 docs/evidence/evidence_index.md
 
-## Cloud Deployment
+Evidence includes
+
+-   local API validation
+-   Docker build and container runtime
+-   Prometheus scraping targets
+-   Grafana dashboard
+-   Locust load testing
+-   GitHub Actions CI success
+-   GHCR container publishing
+-   Kubernetes deployment and pods
+-   Horizontal Pod Autoscaler behaviour
+-   ingress routing
+-   public Render deployment
+
+------------------------------------------------------------------------
+
+## Deployment Summary
+
+This project has been verified across multiple environments.
+
+Local - FastAPI application start verified - Swagger and prediction
+tested
+
+Docker - image built successfully - container runtime verified
+
+Docker Compose Observability Stack - Prometheus scraping confirmed -
+Grafana dashboard operational
+
+CI/CD - GitHub Actions pipeline passed - Docker image pushed to GHCR
+
+Kubernetes - deployment applied successfully - pods healthy - HPA
+configured - ingress routing functional
+
+Cloud Deployment - Render deployment successful - model artifact
+generated during build - public endpoints verified
+
+------------------------------------------------------------------------
+
+## Limitations
+
+This project demonstrates production-style engineering patterns but is
+not a hardened enterprise deployment.
+
+Limitations include
+
+-   no authentication or API key protection
+-   no rate limiting
+-   no formal model registry
+-   no secrets management workflow
+-   no distributed tracing
+-   no alerting rules configured
+-   no centralized log aggregation
+-   no managed Kubernetes cluster
+-   no canary or blue/green release strategy
+-   Render free-tier cold start behaviour
+
+------------------------------------------------------------------------
+
+## Future Improvements
+
+Possible improvements
+
+-   API authentication
+-   request throttling
+-   model versioning and registry integration
+-   structured prediction logging
+-   alerting with Prometheus and Grafana
+-   infrastructure-as-code provisioning
+-   managed Kubernetes deployment
+-   progressive deployment strategies
+
+------------------------------------------------------------------------
+
+## What This Project Demonstrates
 
-The containerized ML inference API was deployed to Render as a public web service using the project’s GitHub repository and Dockerfile.
+This project demonstrates applied engineering capability in
 
-Cloud deployment verified:
-- public URL reachable
-- `/docs` available
-- `/health/live` available
-- `/health/ready` available
-- `/predict` returning successful inference
+-   ML artifact management
+-   inference API design
+-   automated testing
+-   observability integration
+-   containerization
+-   CI/CD workflows
+-   container registry publishing
+-   Kubernetes orchestration
+-   autoscaling
+-   ingress routing
+-   public cloud deployment
 
-Note: the Render deployment uses a free web service instance for demonstration purposes.
+It represents an end-to-end machine learning inference service rather
+than a notebook-only model experiment.