Skip to content

Commit ff22b2f

Browse files
committed
Prediction testing
1 parent acd51f6 commit ff22b2f

File tree

5 files changed

+202
-124
lines changed

5 files changed

+202
-124
lines changed

README.md

Lines changed: 106 additions & 121 deletions
Original file line numberDiff line numberDiff line change
@@ -1,135 +1,120 @@
11
# Kubernetes Failure Prediction
22

3-
## 📌 Project Overview
4-
This project predicts potential failures in Kubernetes clusters using machine learning. The model is trained to detect issues such as:
5-
- 🚨 **Node or pod failures**
6-
- 🖥 **Resource exhaustion** (CPU, memory, disk)
7-
- 🌐 **Network or connectivity issues**
8-
- ⚠️ **Service disruptions** based on logs and events
9-
10-
The solution is packaged into a **FastAPI** service and deployed using **Docker** and **Kubernetes**.
3+
# Deployed Links and Presentation
4+
5+
## Index
6+
- [Project Overview](#project-overview)
7+
- [Directory Structure](#directory-structure)
8+
- [Installation and Setup](#installation-and-setup)
9+
- [Prerequisites](#prerequisites)
10+
- [Setup](#setup)
11+
- [Model Training](#model-training)
12+
- [API Endpoints](#api-endpoints)
13+
- [POST /predict](#post-predict)
14+
- [Deployment on Render](#deployment-on-render)
15+
- [Submission Requirements](#submission-requirements)
1116

1217
---
1318

14-
## 📂 Directory Structure
15-
```
16-
📦 k8s-failure-prediction
17-
├── 📁 data # Raw & processed data files
18-
│ ├── raw_metrics.csv # Original collected metrics
19-
│ ├── processed_metrics.csv # Preprocessed data for training
20-
21-
├── 📁 models # Trained machine learning models
22-
│ ├── failure_predictor.pkl # Final trained model
23-
24-
├── 📁 scripts # Model training and evaluation scripts
25-
│ ├── train_model.py # Script to train the ML model
26-
│ ├── evaluate_model.py # Model evaluation script
27-
28-
├── 📁 app # API service
29-
│ ├── app.py # FastAPI service for predictions
30-
│ ├── Dockerfile # Dockerfile for containerization
31-
32-
├── 📁 deployment # Kubernetes deployment files
33-
│ ├── deployment.yaml # Kubernetes deployment manifest
34-
│ ├── service.yaml # Kubernetes service manifest
35-
36-
├── README.md # Documentation
37-
└── requirements.txt # Python dependencies
38-
```
39-
40-
---
19+
## Project Overview
20+
This project aims to develop a machine learning model to predict failures in Kubernetes clusters based on given or simulated data. The trained model is exposed via a FastAPI service and deployed using Docker and Render.
4121

42-
## 🚀 Setup & Installation
43-
44-
### 1️⃣ Install Dependencies
45-
Ensure you have Python 3.8+ installed. Then, install the required libraries:
46-
```bash
47-
pip install -r requirements.txt
22+
## Directory Structure
4823
```
49-
50-
### 2️⃣ Train the Model
51-
If needed, retrain the model using:
52-
```bash
53-
python scripts/train_model.py
24+
.
25+
├── models
26+
│ ├── k8s_failure_model.pkl # Trained machine learning model
27+
├── scripts
28+
│ ├── train_model.py # Script for training the model
29+
│ ├── test_model.py # Script for testing the model
30+
├── app.py # FastAPI application
31+
├── Dockerfile # Docker configuration
32+
├── requirements.txt # Python dependencies
33+
├── README.md # Project documentation
5434
```
55-
The trained model will be saved in the `models/` directory.
5635

57-
### 3️⃣ Run the API Locally
58-
```bash
59-
uvicorn app:app --host 0.0.0.0 --port 8000
60-
```
61-
Test the API using:
62-
```bash
63-
curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"cpu": 80, "memory": 90, "disk": 70}'
64-
```
65-
66-
---
67-
68-
## 🐳 Dockerization & Kubernetes Deployment
69-
70-
### 🏗️ Build & Run with Docker
71-
1. **Build the Docker image**
72-
```bash
73-
docker build -t pavithra/k8s-failure-predictor:v1 .
74-
```
75-
2. **Run the container**
76-
```bash
77-
docker run -p 8000:8000 pavithra/k8s-failure-predictor:v1
78-
```
79-
3. **Push to Docker Hub**
80-
```bash
81-
docker push pavithra/k8s-failure-predictor:v1
82-
```
83-
84-
### ☸️ Deploy to Kubernetes
85-
1. **Apply deployment and service manifests**
86-
```bash
87-
kubectl apply -f deployment/deployment.yaml
88-
kubectl apply -f deployment/service.yaml
89-
```
90-
2. **Check running pods**
91-
```bash
92-
kubectl get pods
36+
## Installation and Setup
37+
38+
### Prerequisites
39+
- Python 3.8+
40+
- Docker
41+
- Render account
42+
43+
### Setup
44+
1. Clone the repository:
45+
```sh
46+
git clone https://github.com/your-repo/k8s-failure-prediction.git
47+
cd k8s-failure-prediction
48+
```
49+
2. Install dependencies:
50+
```sh
51+
pip install -r requirements.txt
52+
```
53+
3. Run the FastAPI service locally:
54+
```sh
55+
uvicorn app:app --host 0.0.0.0 --port 8000
56+
```
57+
58+
## Model Training
59+
To train the machine learning model, run:
60+
```sh
61+
python scripts/train_model.py
9362
```
94-
3. **Expose the service**
95-
```bash
96-
kubectl port-forward service/k8s-failure-predictor 8000:8000
63+
This script loads data, preprocesses it, and trains a classifier to predict Kubernetes failures.
64+
65+
## API Endpoints
66+
67+
### POST /predict
68+
- **Endpoint:** `/predict`
69+
- **Method:** POST
70+
- **Request Body:**
71+
```json
72+
{
73+
"cpu_usage": 0.5,
74+
"memory_usage": 0.7,
75+
"container_network_receive_bytes_total": 3000,
76+
"container_network_transmit_bytes_total": 2500,
77+
"container_fs_usage_bytes": 5000,
78+
"cpu_usage_avg": 0.45,
79+
"memory_usage_avg": 0.68,
80+
"container_network_receive_bytes_total_avg": 2900,
81+
"container_network_transmit_bytes_total_avg": 2400,
82+
"container_fs_usage_bytes_avg": 4800,
83+
"container_restart_count_avg": 2
84+
}
9785
```
98-
4. **Test the API**
99-
```bash
100-
curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d '{"cpu": 85, "memory": 95, "disk": 80}'
86+
- **Response:**
87+
```json
88+
{
89+
"failure_predicted": "YES"
90+
}
10191
```
10292

103-
---
104-
105-
## 📊 Model Performance
106-
### ✅ Accuracy Scores:
107-
- **Train Accuracy:** 86.80%
108-
- **Test Accuracy:** 68.88%
109-
110-
### 📉 Classification Report:
111-
| Class | Precision | Recall | F1-Score | Support |
112-
|-------|-----------|--------|----------|---------|
113-
| **0** (No Failure) | 0.82 | 0.64 | 0.72 | 904 |
114-
| **1** (Failure) | 0.56 | 0.77 | 0.65 | 542 |
115-
116-
**Macro Avg:** 69% | **Weighted Avg:** 73%
117-
118-
---
119-
120-
## 📌 Future Improvements
121-
**Enhance Feature Engineering** – Incorporate more time-series trends 📈
122-
**Optimize Hyperparameters** – Use Bayesian optimization 🔬
123-
**Deploy on Cloud** – Host on AWS/GCP/Azure ☁️
124-
**Improve Model Interpretability** – Use SHAP/LIME 📊
125-
126-
---
127-
128-
## 🤝 Contributing
129-
Feel free to fork, contribute, and improve the model. PRs are welcome! 🎯
130-
131-
---
132-
133-
## 🏆 Acknowledgments
134-
Thanks to the open-source community and Kubernetes practitioners for providing valuable datasets and insights!
93+
## Deployment on Render
94+
95+
1. Build and push the Docker image:
96+
```sh
97+
docker build -t your-dockerhub-username/k8s-model:latest .
98+
docker push your-dockerhub-username/k8s-model:latest
99+
```
100+
2. Go to [Render](https://render.com) and create a **new Web Service**.
101+
3. Select **Deploy from Docker** and provide the image name (`your-dockerhub-username/k8s-model:latest`).
102+
4. Set the port to `8000`.
103+
5. Click **Deploy**.
104+
6. Once deployed, test the API using:
105+
```sh
106+
curl -X POST https://your-render-url.onrender.com/predict \
107+
-H "Content-Type: application/json" \
108+
-d '{ "cpu_usage": 0.5, "memory_usage": 0.7, ... }'
109+
```
110+
111+
## Submission Requirements
112+
113+
- **Model**: A trained machine learning model (`k8s_failure_model.pkl`).
114+
- **Codebase**: Functional code including data collection, model training, and evaluation scripts.
115+
- **Documentation**: Explanation of approach, metrics, and model performance.
116+
- **Presentation**: Recorded demo of the model's predictions and results.
117+
- **Test Data**: Sample data used for testing and validation.
118+
119+
This project follows industry best practices and provides a scalable solution for Kubernetes failure prediction.
135120

requirements.txt

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,19 @@
1+
# FastAPI and server
12
fastapi
23
uvicorn
3-
scikit-learn
4+
5+
# Data handling
46
pandas
57
numpy
6-
matplotlib
7-
requests
8+
9+
# Machine Learning
10+
scikit-learn
11+
xgboost
12+
joblib
13+
14+
# Kubernetes interaction (Optional, if needed)
15+
kubernetes
16+
17+
# For logging and debugging
18+
loguru
819

scripts/predict.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import pandas as pd
2+
import joblib
3+
4+
# Load trained model
5+
model = joblib.load("../models/failure_predictor.pkl")
6+
7+
# Load new data for prediction
8+
df = pd.read_csv("../data/processed_metrics.csv").drop(columns=["label"]).tail(1)
9+
10+
# Make a prediction
11+
prediction = model.predict(df)
12+
print(f"⚠️ Failure Predicted: {'YES' if prediction[0] == 1 else 'NO'}")
13+

scripts/process_data.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import pandas as pd
2+
import numpy as np
3+
4+
def fetch_metric(metric_name):
5+
""" Generate synthetic metric data for Kubernetes failures """
6+
np.random.seed(42)
7+
timestamps = pd.date_range(start="2024-01-01", periods=5000, freq="T")
8+
data = {
9+
"timestamp": timestamps,
10+
metric_name: np.random.rand(len(timestamps)) * 100 #random for now
11+
}
12+
return pd.DataFrame(data)
13+
metrics = [
14+
"cpu_usage", "memory_usage", "container_network_receive_bytes_total",
15+
"container_network_transmit_bytes_total", "container_fs_usage_bytes",
16+
"container_restart_count"
17+
]
18+
19+
# Merge all metrics
20+
data = fetch_metric(metrics[0])
21+
for metric in metrics[1:]:
22+
metric_df = fetch_metric(metric)
23+
data = pd.merge(data, metric_df, on="timestamp", how="left")
24+
data["target"] = np.random.choice([0, 1], size=len(data), p=[0.9, 0.1])
25+
data.to_csv("data/merged_data.csv", index=False)
26+
print("Saved as 'data/merged_data.csv'")
27+

scripts/render.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import pickle
2+
import numpy as np
3+
import uvicorn
4+
from fastapi import FastAPI
5+
from pydantic import BaseModel
6+
7+
# Initialize FastAPI app
8+
app = FastAPI()
9+
10+
# Load trained model
11+
model_path = "../models/k8s_failure_model.pkl" # Change this to your actual model path
12+
with open(model_path, "rb") as f:
13+
model = pickle.load(f)
14+
15+
# Define input data structure
16+
class ModelInput(BaseModel):
17+
features: list[float] # Example: [5.1, 3.5, 1.4, 0.2]
18+
19+
# Root endpoint
20+
@app.get("/")
21+
def home():
22+
return {"message": "K8s Failure Prediction API is Running!"}
23+
24+
# Prediction endpoint
25+
@app.post("/predict")
26+
def predict(data: ModelInput):
27+
try:
28+
# Convert input to NumPy array and reshape for prediction
29+
input_data = np.array(data.features).reshape(1, -1)
30+
31+
# Make prediction
32+
prediction = model.predict(input_data)
33+
34+
return {"prediction": prediction.tolist()}
35+
36+
except Exception as e:
37+
return {"error": str(e)}
38+
39+
# Run the server if executed directly
40+
if __name__ == "__main__":
41+
uvicorn.run(app, host="0.0.0.0", port=8000)
42+

0 commit comments

Comments
 (0)