|
1 | 1 | # Kubernetes Failure Prediction
|
2 | 2 |
|
3 |
| -## 📌 Project Overview |
4 |
| -This project predicts potential failures in Kubernetes clusters using machine learning. The model is trained to detect issues such as: |
5 |
| -- 🚨 **Node or pod failures** |
6 |
| -- 🖥 **Resource exhaustion** (CPU, memory, disk) |
7 |
| -- 🌐 **Network or connectivity issues** |
8 |
| -- ⚠️ **Service disruptions** based on logs and events |
9 |
| - |
10 |
| -The solution is packaged into a **FastAPI** service and deployed using **Docker** and **Kubernetes**. |
| 3 | +# Deployed Links and Presentation |
| 4 | + |
| 5 | +## Index |
| 6 | +- [Project Overview](#project-overview) |
| 7 | +- [Directory Structure](#directory-structure) |
| 8 | +- [Installation and Setup](#installation-and-setup) |
| 9 | + - [Prerequisites](#prerequisites) |
| 10 | + - [Setup](#setup) |
| 11 | +- [Model Training](#model-training) |
| 12 | +- [API Endpoints](#api-endpoints) |
| 13 | + - [POST /predict](#post-predict) |
| 14 | +- [Deployment on Render](#deployment-on-render) |
| 15 | +- [Submission Requirements](#submission-requirements) |
11 | 16 |
|
12 | 17 | ---
|
13 | 18 |
|
14 |
| -## 📂 Directory Structure |
15 |
| -``` |
16 |
| -📦 k8s-failure-prediction |
17 |
| -├── 📁 data # Raw & processed data files |
18 |
| -│ ├── raw_metrics.csv # Original collected metrics |
19 |
| -│ ├── processed_metrics.csv # Preprocessed data for training |
20 |
| -│ |
21 |
| -├── 📁 models # Trained machine learning models |
22 |
| -│ ├── failure_predictor.pkl # Final trained model |
23 |
| -│ |
24 |
| -├── 📁 scripts # Model training and evaluation scripts |
25 |
| -│ ├── train_model.py # Script to train the ML model |
26 |
| -│ ├── evaluate_model.py # Model evaluation script |
27 |
| -│ |
28 |
| -├── 📁 app # API service |
29 |
| -│ ├── app.py # FastAPI service for predictions |
30 |
| -│ ├── Dockerfile # Dockerfile for containerization |
31 |
| -│ |
32 |
| -├── 📁 deployment # Kubernetes deployment files |
33 |
| -│ ├── deployment.yaml # Kubernetes deployment manifest |
34 |
| -│ ├── service.yaml # Kubernetes service manifest |
35 |
| -│ |
36 |
| -├── README.md # Documentation |
37 |
| -└── requirements.txt # Python dependencies |
38 |
| -``` |
39 |
| - |
40 |
| ---- |
| 19 | +## Project Overview |
| 20 | +This project aims to develop a machine learning model to predict failures in Kubernetes clusters based on given or simulated data. The trained model is exposed via a FastAPI service and deployed using Docker and Render. |
41 | 21 |
|
42 |
| -## 🚀 Setup & Installation |
43 |
| - |
44 |
| -### 1️⃣ Install Dependencies |
45 |
| -Ensure you have Python 3.8+ installed. Then, install the required libraries: |
46 |
| -```bash |
47 |
| -pip install -r requirements.txt |
| 22 | +## Directory Structure |
48 | 23 | ```
|
49 |
| - |
50 |
| -### 2️⃣ Train the Model |
51 |
| -If needed, retrain the model using: |
52 |
| -```bash |
53 |
| -python scripts/train_model.py |
| 24 | +. |
| 25 | +├── models |
| 26 | +│ ├── k8s_failure_model.pkl # Trained machine learning model |
| 27 | +├── scripts |
| 28 | +│ ├── train_model.py # Script for training the model |
| 29 | +│ ├── test_model.py # Script for testing the model |
| 30 | +├── app.py # FastAPI application |
| 31 | +├── Dockerfile # Docker configuration |
| 32 | +├── requirements.txt # Python dependencies |
| 33 | +├── README.md # Project documentation |
54 | 34 | ```
|
55 |
| -The trained model will be saved in the `models/` directory. |
56 | 35 |
|
57 |
| -### 3️⃣ Run the API Locally |
58 |
| -```bash |
59 |
| -uvicorn app:app --host 0.0.0.0 --port 8000 |
60 |
| -``` |
61 |
| -Test the API using: |
62 |
| -```bash |
63 |
| -curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"cpu": 80, "memory": 90, "disk": 70}' |
64 |
| -``` |
65 |
| - |
66 |
| ---- |
67 |
| - |
68 |
| -## 🐳 Dockerization & Kubernetes Deployment |
69 |
| - |
70 |
| -### 🏗️ Build & Run with Docker |
71 |
| -1. **Build the Docker image** |
72 |
| -```bash |
73 |
| -docker build -t pavithra/k8s-failure-predictor:v1 . |
74 |
| -``` |
75 |
| -2. **Run the container** |
76 |
| -```bash |
77 |
| -docker run -p 8000:8000 pavithra/k8s-failure-predictor:v1 |
78 |
| -``` |
79 |
| -3. **Push to Docker Hub** |
80 |
| -```bash |
81 |
| -docker push pavithra/k8s-failure-predictor:v1 |
82 |
| -``` |
83 |
| - |
84 |
| -### ☸️ Deploy to Kubernetes |
85 |
| -1. **Apply deployment and service manifests** |
86 |
| -```bash |
87 |
| -kubectl apply -f deployment/deployment.yaml |
88 |
| -kubectl apply -f deployment/service.yaml |
89 |
| -``` |
90 |
| -2. **Check running pods** |
91 |
| -```bash |
92 |
| -kubectl get pods |
| 36 | +## Installation and Setup |
| 37 | + |
| 38 | +### Prerequisites |
| 39 | +- Python 3.8+ |
| 40 | +- Docker |
| 41 | +- Render account |
| 42 | + |
| 43 | +### Setup |
| 44 | +1. Clone the repository: |
| 45 | + ```sh |
| 46 | + git clone https://github.com/your-repo/k8s-failure-prediction.git |
| 47 | + cd k8s-failure-prediction |
| 48 | + ``` |
| 49 | +2. Install dependencies: |
| 50 | + ```sh |
| 51 | + pip install -r requirements.txt |
| 52 | + ``` |
| 53 | +3. Run the FastAPI service locally: |
| 54 | + ```sh |
| 55 | + uvicorn app:app --host 0.0.0.0 --port 8000 |
| 56 | + ``` |
| 57 | + |
| 58 | +## Model Training |
| 59 | +To train the machine learning model, run: |
| 60 | +```sh |
| 61 | +python scripts/train_model.py |
93 | 62 | ```
|
94 |
| -3. **Expose the service** |
95 |
| -```bash |
96 |
| -kubectl port-forward service/k8s-failure-predictor 8000:8000 |
| 63 | +This script loads data, preprocesses it, and trains a classifier to predict Kubernetes failures. |
| 64 | + |
| 65 | +## API Endpoints |
| 66 | + |
| 67 | +### POST /predict |
| 68 | +- **Endpoint:** `/predict` |
| 69 | +- **Method:** POST |
| 70 | +- **Request Body:** |
| 71 | +```json |
| 72 | +{ |
| 73 | + "cpu_usage": 0.5, |
| 74 | + "memory_usage": 0.7, |
| 75 | + "container_network_receive_bytes_total": 3000, |
| 76 | + "container_network_transmit_bytes_total": 2500, |
| 77 | + "container_fs_usage_bytes": 5000, |
| 78 | + "cpu_usage_avg": 0.45, |
| 79 | + "memory_usage_avg": 0.68, |
| 80 | + "container_network_receive_bytes_total_avg": 2900, |
| 81 | + "container_network_transmit_bytes_total_avg": 2400, |
| 82 | + "container_fs_usage_bytes_avg": 4800, |
| 83 | + "container_restart_count_avg": 2 |
| 84 | +} |
97 | 85 | ```
|
98 |
| -4. **Test the API** |
99 |
| -```bash |
100 |
| -curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"cpu": 85, "memory": 95, "disk": 80}' |
| 86 | +- **Response:** |
| 87 | +```json |
| 88 | +{ |
| 89 | + "failure_predicted": "YES" |
| 90 | +} |
101 | 91 | ```
|
102 | 92 |
|
103 |
| ---- |
104 |
| - |
105 |
| -## 📊 Model Performance |
106 |
| -### ✅ Accuracy Scores: |
107 |
| -- **Train Accuracy:** 86.80% |
108 |
| -- **Test Accuracy:** 68.88% |
109 |
| - |
110 |
| -### 📉 Classification Report: |
111 |
| -| Class | Precision | Recall | F1-Score | Support | |
112 |
| -|-------|-----------|--------|----------|---------| |
113 |
| -| **0** (No Failure) | 0.82 | 0.64 | 0.72 | 904 | |
114 |
| -| **1** (Failure) | 0.56 | 0.77 | 0.65 | 542 | |
115 |
| - |
116 |
| -**Macro Avg:** 69% | **Weighted Avg:** 73% |
117 |
| - |
118 |
| ---- |
119 |
| - |
120 |
| -## 📌 Future Improvements |
121 |
| -✅ **Enhance Feature Engineering** – Incorporate more time-series trends 📈 |
122 |
| -✅ **Optimize Hyperparameters** – Use Bayesian optimization 🔬 |
123 |
| -✅ **Deploy on Cloud** – Host on AWS/GCP/Azure ☁️ |
124 |
| -✅ **Improve Model Interpretability** – Use SHAP/LIME 📊 |
125 |
| - |
126 |
| ---- |
127 |
| - |
128 |
| -## 🤝 Contributing |
129 |
| -Feel free to fork, contribute, and improve the model. PRs are welcome! 🎯 |
130 |
| - |
131 |
| ---- |
132 |
| - |
133 |
| -## 🏆 Acknowledgments |
134 |
| -Thanks to the open-source community and Kubernetes practitioners for providing valuable datasets and insights! |
| 93 | +## Deployment on Render |
| 94 | + |
| 95 | +1. Build and push the Docker image: |
| 96 | + ```sh |
| 97 | + docker build -t your-dockerhub-username/k8s-model:latest . |
| 98 | + docker push your-dockerhub-username/k8s-model:latest |
| 99 | + ``` |
| 100 | +2. Go to [Render](https://render.com) and create a **new Web Service**. |
| 101 | +3. Select **Deploy from Docker** and provide the image name (`your-dockerhub-username/k8s-model:latest`). |
| 102 | +4. Set the port to `8000`. |
| 103 | +5. Click **Deploy**. |
| 104 | +6. Once deployed, test the API using: |
| 105 | + ```sh |
| 106 | + curl -X POST https://your-render-url.onrender.com/predict \ |
| 107 | + -H "Content-Type: application/json" \ |
| 108 | + -d '{ "cpu_usage": 0.5, "memory_usage": 0.7, ... }' |
| 109 | + ``` |
| 110 | + |
| 111 | +## Submission Requirements |
| 112 | + |
| 113 | +- **Model**: A trained machine learning model (`k8s_failure_model.pkl`). |
| 114 | +- **Codebase**: Functional code including data collection, model training, and evaluation scripts. |
| 115 | +- **Documentation**: Explanation of approach, metrics, and model performance. |
| 116 | +- **Presentation**: Recorded demo of the model's predictions and results. |
| 117 | +- **Test Data**: Sample data used for testing and validation. |
| 118 | + |
| 119 | +This project follows industry best practices and provides a scalable solution for Kubernetes failure prediction. |
135 | 120 |
|
0 commit comments