|
1 | 1 | # Kubernetes Failure Prediction |
2 | 2 |
|
3 | | -## 📌 Project Overview |
4 | | -This project predicts potential failures in Kubernetes clusters using machine learning. The model is trained to detect issues such as: |
5 | | -- 🚨 **Node or pod failures** |
6 | | -- 🖥 **Resource exhaustion** (CPU, memory, disk) |
7 | | -- 🌐 **Network or connectivity issues** |
8 | | -- ⚠️ **Service disruptions** based on logs and events |
9 | | - |
10 | | -The solution is packaged into a **FastAPI** service and deployed using **Docker** and **Kubernetes**. |
| 3 | +# Deployed Links and Presentation |
| 4 | + |
| 5 | +## Index |
| 6 | +- [Project Overview](#project-overview) |
| 7 | +- [Directory Structure](#directory-structure) |
| 8 | +- [Installation and Setup](#installation-and-setup) |
| 9 | + - [Prerequisites](#prerequisites) |
| 10 | + - [Setup](#setup) |
| 11 | +- [Model Training](#model-training) |
| 12 | +- [API Endpoints](#api-endpoints) |
| 13 | + - [POST /predict](#post-predict) |
| 14 | +- [Deployment on Render](#deployment-on-render) |
| 15 | +- [Submission Requirements](#submission-requirements) |
11 | 16 |
|
12 | 17 | --- |
13 | 18 |
|
14 | | -## 📂 Directory Structure |
15 | | -``` |
16 | | -📦 k8s-failure-prediction |
17 | | -├── 📁 data # Raw & processed data files |
18 | | -│ ├── raw_metrics.csv # Original collected metrics |
19 | | -│ ├── processed_metrics.csv # Preprocessed data for training |
20 | | -│ |
21 | | -├── 📁 models # Trained machine learning models |
22 | | -│ ├── failure_predictor.pkl # Final trained model |
23 | | -│ |
24 | | -├── 📁 scripts # Model training and evaluation scripts |
25 | | -│ ├── train_model.py # Script to train the ML model |
26 | | -│ ├── evaluate_model.py # Model evaluation script |
27 | | -│ |
28 | | -├── 📁 app # API service |
29 | | -│ ├── app.py # FastAPI service for predictions |
30 | | -│ ├── Dockerfile # Dockerfile for containerization |
31 | | -│ |
32 | | -├── 📁 deployment # Kubernetes deployment files |
33 | | -│ ├── deployment.yaml # Kubernetes deployment manifest |
34 | | -│ ├── service.yaml # Kubernetes service manifest |
35 | | -│ |
36 | | -├── README.md # Documentation |
37 | | -└── requirements.txt # Python dependencies |
38 | | -``` |
39 | | - |
40 | | ---- |
| 19 | +## Project Overview |
| 20 | +This project aims to develop a machine learning model to predict failures in Kubernetes clusters based on given or simulated data. The trained model is exposed via a FastAPI service and deployed using Docker and Render. |
41 | 21 |
|
42 | | -## 🚀 Setup & Installation |
43 | | - |
44 | | -### 1️⃣ Install Dependencies |
45 | | -Ensure you have Python 3.8+ installed. Then, install the required libraries: |
46 | | -```bash |
47 | | -pip install -r requirements.txt |
| 22 | +## Directory Structure |
48 | 23 | ``` |
49 | | - |
50 | | -### 2️⃣ Train the Model |
51 | | -If needed, retrain the model using: |
52 | | -```bash |
53 | | -python scripts/train_model.py |
| 24 | +. |
| 25 | +├── models |
| 26 | +│ ├── k8s_failure_model.pkl # Trained machine learning model |
| 27 | +├── scripts |
| 28 | +│ ├── train_model.py # Script for training the model |
| 29 | +│ ├── test_model.py # Script for testing the model |
| 30 | +├── app.py # FastAPI application |
| 31 | +├── Dockerfile # Docker configuration |
| 32 | +├── requirements.txt # Python dependencies |
| 33 | +├── README.md # Project documentation |
54 | 34 | ``` |
55 | | -The trained model will be saved in the `models/` directory. |
56 | 35 |
|
57 | | -### 3️⃣ Run the API Locally |
58 | | -```bash |
59 | | -uvicorn app:app --host 0.0.0.0 --port 8000 |
60 | | -``` |
61 | | -Test the API using: |
62 | | -```bash |
63 | | -curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"cpu": 80, "memory": 90, "disk": 70}' |
64 | | -``` |
65 | | - |
66 | | ---- |
67 | | - |
68 | | -## 🐳 Dockerization & Kubernetes Deployment |
69 | | - |
70 | | -### 🏗️ Build & Run with Docker |
71 | | -1. **Build the Docker image** |
72 | | -```bash |
73 | | -docker build -t pavithra/k8s-failure-predictor:v1 . |
74 | | -``` |
75 | | -2. **Run the container** |
76 | | -```bash |
77 | | -docker run -p 8000:8000 pavithra/k8s-failure-predictor:v1 |
78 | | -``` |
79 | | -3. **Push to Docker Hub** |
80 | | -```bash |
81 | | -docker push pavithra/k8s-failure-predictor:v1 |
82 | | -``` |
83 | | - |
84 | | -### ☸️ Deploy to Kubernetes |
85 | | -1. **Apply deployment and service manifests** |
86 | | -```bash |
87 | | -kubectl apply -f deployment/deployment.yaml |
88 | | -kubectl apply -f deployment/service.yaml |
89 | | -``` |
90 | | -2. **Check running pods** |
91 | | -```bash |
92 | | -kubectl get pods |
| 36 | +## Installation and Setup |
| 37 | + |
| 38 | +### Prerequisites |
| 39 | +- Python 3.8+ |
| 40 | +- Docker |
| 41 | +- Render account |
| 42 | + |
| 43 | +### Setup |
| 44 | +1. Clone the repository: |
| 45 | + ```sh |
| 46 | + git clone https://github.com/your-repo/k8s-failure-prediction.git |
| 47 | + cd k8s-failure-prediction |
| 48 | + ``` |
| 49 | +2. Install dependencies: |
| 50 | + ```sh |
| 51 | + pip install -r requirements.txt |
| 52 | + ``` |
| 53 | +3. Run the FastAPI service locally: |
| 54 | + ```sh |
| 55 | + uvicorn app:app --host 0.0.0.0 --port 8000 |
| 56 | + ``` |
| 57 | + |
| 58 | +## Model Training |
| 59 | +To train the machine learning model, run: |
| 60 | +```sh |
| 61 | +python scripts/train_model.py |
93 | 62 | ``` |
94 | | -3. **Expose the service** |
95 | | -```bash |
96 | | -kubectl port-forward service/k8s-failure-predictor 8000:8000 |
| 63 | +This script loads data, preprocesses it, and trains a classifier to predict Kubernetes failures. |
| 64 | + |
| 65 | +## API Endpoints |
| 66 | + |
| 67 | +### POST /predict |
| 68 | +- **Endpoint:** `/predict` |
| 69 | +- **Method:** POST |
| 70 | +- **Request Body:** |
| 71 | +```json |
| 72 | +{ |
| 73 | + "cpu_usage": 0.5, |
| 74 | + "memory_usage": 0.7, |
| 75 | + "container_network_receive_bytes_total": 3000, |
| 76 | + "container_network_transmit_bytes_total": 2500, |
| 77 | + "container_fs_usage_bytes": 5000, |
| 78 | + "cpu_usage_avg": 0.45, |
| 79 | + "memory_usage_avg": 0.68, |
| 80 | + "container_network_receive_bytes_total_avg": 2900, |
| 81 | + "container_network_transmit_bytes_total_avg": 2400, |
| 82 | + "container_fs_usage_bytes_avg": 4800, |
| 83 | + "container_restart_count_avg": 2 |
| 84 | +} |
97 | 85 | ``` |
98 | | -4. **Test the API** |
99 | | -```bash |
100 | | -curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"cpu": 85, "memory": 95, "disk": 80}' |
| 86 | +- **Response:** |
| 87 | +```json |
| 88 | +{ |
| 89 | + "failure_predicted": "YES" |
| 90 | +} |
101 | 91 | ``` |
102 | 92 |
|
103 | | ---- |
104 | | - |
105 | | -## 📊 Model Performance |
106 | | -### ✅ Accuracy Scores: |
107 | | -- **Train Accuracy:** 86.80% |
108 | | -- **Test Accuracy:** 68.88% |
109 | | - |
110 | | -### 📉 Classification Report: |
111 | | -| Class | Precision | Recall | F1-Score | Support | |
112 | | -|-------|-----------|--------|----------|---------| |
113 | | -| **0** (No Failure) | 0.82 | 0.64 | 0.72 | 904 | |
114 | | -| **1** (Failure) | 0.56 | 0.77 | 0.65 | 542 | |
115 | | - |
116 | | -**Macro Avg:** 69% | **Weighted Avg:** 73% |
117 | | - |
118 | | ---- |
119 | | - |
120 | | -## 📌 Future Improvements |
121 | | -✅ **Enhance Feature Engineering** – Incorporate more time-series trends 📈 |
122 | | -✅ **Optimize Hyperparameters** – Use Bayesian optimization 🔬 |
123 | | -✅ **Deploy on Cloud** – Host on AWS/GCP/Azure ☁️ |
124 | | -✅ **Improve Model Interpretability** – Use SHAP/LIME 📊 |
125 | | - |
126 | | ---- |
127 | | - |
128 | | -## 🤝 Contributing |
129 | | -Feel free to fork, contribute, and improve the model. PRs are welcome! 🎯 |
130 | | - |
131 | | ---- |
132 | | - |
133 | | -## 🏆 Acknowledgments |
134 | | -Thanks to the open-source community and Kubernetes practitioners for providing valuable datasets and insights! |
| 93 | +## Deployment on Render |
| 94 | + |
| 95 | +1. Build and push the Docker image: |
| 96 | + ```sh |
| 97 | + docker build -t your-dockerhub-username/k8s-model:latest . |
| 98 | + docker push your-dockerhub-username/k8s-model:latest |
| 99 | + ``` |
| 100 | +2. Go to [Render](https://render.com) and create a **new Web Service**. |
| 101 | +3. Select **Deploy from Docker** and provide the image name (`your-dockerhub-username/k8s-model:latest`). |
| 102 | +4. Set the port to `8000`. |
| 103 | +5. Click **Deploy**. |
| 104 | +6. Once deployed, test the API using: |
| 105 | + ```sh |
| 106 | + curl -X POST https://your-render-url.onrender.com/predict \ |
| 107 | + -H "Content-Type: application/json" \ |
| 108 | + -d '{ "cpu_usage": 0.5, "memory_usage": 0.7, ... }' |
| 109 | + ``` |
| 110 | + |
| 111 | +## Submission Requirements |
| 112 | + |
| 113 | +- **Model**: A trained machine learning model (`k8s_failure_model.pkl`). |
| 114 | +- **Codebase**: Functional code including data collection, model training, and evaluation scripts. |
| 115 | +- **Documentation**: Explanation of approach, metrics, and model performance. |
| 116 | +- **Presentation**: Recorded demo of the model's predictions and results. |
| 117 | +- **Test Data**: Sample data used for testing and validation. |
| 118 | + |
| 119 | +This project follows industry best practices and provides a scalable solution for Kubernetes failure prediction. |
135 | 120 |
|
0 commit comments