Skip to content

Commit 68e09f7

Browse files
committed
README
1 parent 7bcefbc commit 68e09f7

File tree

1 file changed

+135
-0
lines changed

1 file changed

+135
-0
lines changed

README.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# Kubernetes Failure Prediction
2+
3+
## 📌 Project Overview
4+
This project predicts potential failures in Kubernetes clusters using machine learning. The model is trained to detect issues such as:
5+
- 🚨 **Node or pod failures**
6+
- 🖥 **Resource exhaustion** (CPU, memory, disk)
7+
- 🌐 **Network or connectivity issues**
8+
- ⚠️ **Service disruptions** based on logs and events
9+
10+
The solution is packaged into a **FastAPI** service and deployed using **Docker** and **Kubernetes**.
11+
12+
---
13+
14+
## 📂 Directory Structure
15+
```
16+
📦 k8s-failure-prediction
17+
├── 📁 data # Raw & processed data files
18+
│ ├── raw_metrics.csv # Original collected metrics
19+
│ ├── processed_metrics.csv # Preprocessed data for training
20+
21+
├── 📁 models # Trained machine learning models
22+
│ ├── failure_predictor.pkl # Final trained model
23+
24+
├── 📁 scripts # Model training and evaluation scripts
25+
│ ├── train_model.py # Script to train the ML model
26+
│ ├── evaluate_model.py # Model evaluation script
27+
28+
├── 📁 app # API service
29+
│ ├── app.py # FastAPI service for predictions
30+
│ ├── Dockerfile # Dockerfile for containerization
31+
32+
├── 📁 deployment # Kubernetes deployment files
33+
│ ├── deployment.yaml # Kubernetes deployment manifest
34+
│ ├── service.yaml # Kubernetes service manifest
35+
36+
├── README.md # Documentation
37+
└── requirements.txt # Python dependencies
38+
```
39+
40+
---
41+
42+
## 🚀 Setup & Installation
43+
44+
### 1️⃣ Install Dependencies
45+
Ensure you have Python 3.8+ installed. Then, install the required libraries:
46+
```bash
47+
pip install -r requirements.txt
48+
```
49+
50+
### 2️⃣ Train the Model
51+
If needed, retrain the model using:
52+
```bash
53+
python scripts/train_model.py
54+
```
55+
The trained model will be saved in the `models/` directory.
56+
57+
### 3️⃣ Run the API Locally
58+
```bash
59+
uvicorn app:app --host 0.0.0.0 --port 8000
60+
```
61+
Test the API using:
62+
```bash
63+
curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"cpu": 80, "memory": 90, "disk": 70}'
64+
```
65+
66+
---
67+
68+
## 🐳 Dockerization & Kubernetes Deployment
69+
70+
### 🏗️ Build & Run with Docker
71+
1. **Build the Docker image**
72+
```bash
73+
docker build -t pavithra/k8s-failure-predictor:v1 .
74+
```
75+
2. **Run the container**
76+
```bash
77+
docker run -p 8000:8000 pavithra/k8s-failure-predictor:v1
78+
```
79+
3. **Push to Docker Hub**
80+
```bash
81+
docker push pavithra/k8s-failure-predictor:v1
82+
```
83+
84+
### ☸️ Deploy to Kubernetes
85+
1. **Apply deployment and service manifests**
86+
```bash
87+
kubectl apply -f deployment/deployment.yaml
88+
kubectl apply -f deployment/service.yaml
89+
```
90+
2. **Check running pods**
91+
```bash
92+
kubectl get pods
93+
```
94+
3. **Expose the service**
95+
```bash
96+
kubectl port-forward service/k8s-failure-predictor 8000:8000
97+
```
98+
4. **Test the API**
99+
```bash
100+
curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"cpu": 85, "memory": 95, "disk": 80}'
101+
```
102+
103+
---
104+
105+
## 📊 Model Performance
106+
### ✅ Accuracy Scores:
107+
- **Train Accuracy:** 86.80%
108+
- **Test Accuracy:** 68.88%
109+
110+
### 📉 Classification Report:
111+
| Class | Precision | Recall | F1-Score | Support |
112+
|-------|-----------|--------|----------|---------|
113+
| **0** (No Failure) | 0.82 | 0.64 | 0.72 | 904 |
114+
| **1** (Failure) | 0.56 | 0.77 | 0.65 | 542 |
115+
116+
**Macro Avg:** 69% | **Weighted Avg:** 73%
117+
118+
---
119+
120+
## 📌 Future Improvements
121+
**Enhance Feature Engineering** – Incorporate more time-series trends 📈
122+
**Optimize Hyperparameters** – Use Bayesian optimization 🔬
123+
**Deploy on Cloud** – Host on AWS/GCP/Azure ☁️
124+
**Improve Model Interpretability** – Use SHAP/LIME 📊
125+
126+
---
127+
128+
## 🤝 Contributing
129+
Feel free to fork, contribute, and improve the model. PRs are welcome! 🎯
130+
131+
---
132+
133+
## 🏆 Acknowledgments
134+
Thanks to the open-source community and Kubernetes practitioners for providing valuable datasets and insights!
135+

0 commit comments

Comments
 (0)