This repository contains a complete Machine Learning–driven Intrusion Detection System (IDS) built using the NSL-KDD dataset. It includes preprocessing, model training, evaluation, ROC curve, a FastAPI prediction service, Docker deployment, testing suite, architecture diagrams, and a dashboard interface.
This project demonstrates end-to-end engineering capability across data engineering, machine learning, cybersecurity analytics, API deployment, testing, and documentation. It forms part of the technical evidence submitted for the UK Global Talent Visa (Digital Technology).
The IDS identifies whether a network connection is normal or malicious by learning patterns from the NSL-KDD intrusion dataset. The system includes:
- Full preprocessing and feature engineering
- Random Forest intrusion classifier
- Saved model artifact (intrusion_model.pkl)
- FastAPI inference service
- Interactive dashboard for CSV intrusion analysis
- ROC-AUC evaluation
- Docker deployment
- Pytest unit tests
- GitHub Actions CI
- Cleans training and test datasets
- Encodes categorical fields
- Standardises numeric features
- Ensures consistent preprocessing between training and inference
- Random Forest classifier
- High recall for attack detection
- ROC-AUC score of 0.961
- Reproducible training script:
train_ids_pipeline.py
/predictendpoint- Validates JSON inputs
- Loads saved intrusion model
- Returns class and probability score
- CSV upload
- Batch predictions
- Attack distribution visualisation
- Dockerfile
- docker-compose.yml
- Uvicorn production server
- Pytest suite
- Tests for preprocessing, model loading, prediction pipeline
- GitHub Actions CI pipeline
├── data/
│ ├── nsl_kdd_train_binary.csv
│ ├── nsl_kdd_test_binary.csv
│ └── intrusion_model.pkl
├── docs/
│ └── figures/
│ ├── ids_architecture.png
│ ├── ids_pipeline.png
│ ├── ids_dashboard.png
│ └── roc_curve.png
├── src/
│ ├── api_main.py
│ ├── train_ids_pipeline.py
│ └── dashboard_app.py
├── tests/
│ ├── test_preprocessing.py
│ ├── test_model_loading.py
│ └── test_prediction.py
├── MODEL_CARD.md
├── TECH_NATION_EVIDENCE.md
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md
pip install -r requirements.txt
python src/train_ids_pipeline.py
uvicorn src.api_main:app --reload --port 8000
python src/dashboard_app.py
pytest
| Metric | Score |
|---|---|
| Accuracy | 0.94 |
| Precision | 0.95 |
| Recall | 0.93 |
| F1 Score | 0.94 |
| ROC-AUC | 0.961 |
ROC curve file: docs/figures/roc_curve.png
flowchart LR
A[NSL-KDD Dataset] --> B[Preprocessing Pipeline]
B --> C[Model Training <br> Random Forest]
C --> D[Saved Model <br> intrusion_model.pkl]
D --> E[FastAPI /predict Endpoint]
E --> F[Dashboard Application]
flowchart LR
A[Load Data] --> B[Clean and Encode]
B --> C[Split into Train/Test]
C --> D[Train Random Forest]
D --> E[Evaluate Metrics]
E --> F[Save Model Artifact]
{
"duration": 0,
"src_bytes": 181,
"dst_bytes": 5450,
"count": 2,
"srv_count": 2,
"serror_rate": 0.0,
"srv_serror_rate": 0.0,
"dst_host_count": 150,
"dst_host_srv_count": 30
}{
"prediction": "attack",
"confidence": 0.97
}This entire IDS project was designed and implemented by Ibrahim Akintunde Akinyera.
Key contributions include:
- Defining the system architecture
- Engineering and preprocessing the NSL-KDD dataset
- Developing the model training pipeline
- Evaluating model metrics and producing ROC curve
- Building the FastAPI prediction service
- Implementing dashboard for CSV-based intrusion analysis
- Creating Docker deployment configuration
- Writing unit tests and CI pipeline
- Preparing full documentation, diagrams, and evidence
This demonstrates end-to-end engineering capability in machine learning, cybersecurity, deployment, and documentation.
MIT License
Ibrahim Akintunde Akinyera