This project implements a machine learning pipeline for network security classification, focusing on detecting and classifying network security threats. The pipeline is built with reproducibility, versioning, and tracking in mind, leveraging modern MLOps tools.
- End-to-End ML Pipeline: Automated data ingestion, validation, transformation, and model training
- Data Version Control: Track and version datasets using DVC
- Experiment Tracking: Monitor model metrics and parameters with MLflow
- Reproducibility: Ensure consistent results across different environments
- CI/CD Integration: Automated testing and deployment workflows
- Containerization: Docker support for consistent deployment
- REST API: FastAPI-based API for real-time predictions
- Text Classification: Support for text-based cyber threat intelligence data
- Multiple Training Approaches: Support for both MongoDB-based and direct file-based training
- Python: Core programming language
- MongoDB: Database for storing network security data
- Scikit-learn & XGBoost: ML algorithms for classification
- DVC: Data version control
- MLflow: Experiment tracking and model registry
- DAGsHub: Collaborative MLOps platform
- Docker: Containerization
- Pytest: Testing framework
- FastAPI: High-performance API framework
- Uvicorn: ASGI server for FastAPI
- Python 3.8+ (Python 3.10 or 3.11 recommended for best compatibility)
- Git
- Docker (optional)
- MongoDB connection string
- DAGsHub account (for MLflow tracking)
-
Clone the repository:
git clone https://github.com/austinLorenzMccoy/networkSecurity_project.git cd networkSecurity_project
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt pip install -e .
-
Set up environment variables:
# Create a .env file with your MongoDB connection string and DAGsHub credentials cp .env.template .env # Edit the .env file with your credentials
-
Initialize DVC:
dvc init
-
Connect to DAGsHub (optional):
# Set up DAGsHub as a remote dvc remote add origin https://dagshub.com/austinLorenzMccoy/networkSecurity_project.dvc
The project uses DVC to define and run the ML pipeline stages:
# Run the entire pipeline
dvc repro
# Run a specific stage
dvc repro -s data_ingestion
dvc repro -s data_validation
dvc repro -s data_transformation
dvc repro -s model_training
# Run the direct training pipeline (using cyber threat intelligence data)
dvc repro -s direct_training
# View pipeline visualization
dvc dag
MLflow is used to track experiments, including parameters, metrics, and artifacts:
# Start the MLflow UI locally
mlflow ui
# Or view experiments on DAGsHub
# Visit: https://dagshub.com/austinLorenzMccoy/networkSecurity_project.mlflow
To enable MLflow tracking with DAGsHub:
-
Set your DAGsHub credentials in the
.env
file:MLFLOW_TRACKING_USERNAME=your_dagshub_username MLFLOW_TRACKING_PASSWORD=your_dagshub_token
-
Run the training pipeline with MLflow tracking:
dvc repro direct_training
-
View your experiments on DAGsHub's MLflow interface
The project includes unit tests using pytest:
# Run all tests
pytest
# Run tests with coverage report
pytest --cov=networksecurity
Build and run the project using Docker:
# Build the Docker image
docker build -t network-security-project .
# Run the container
docker run -p 8000:8000 -e MONGODB_URI=your_mongodb_connection_string network-security-project
The project includes a FastAPI application for serving predictions:
# Run the FastAPI application
python app.py
# Or use the convenience script
bash run_api.sh
- GET /health: Check if the model is loaded and ready
- GET /model-info: Get information about the trained model
- POST /predict: Make predictions using feature vectors
- POST /predict/text: Make predictions using raw text input
# Check health status
curl -X GET "http://localhost:8000/health"
# Get model information
curl -X GET "http://localhost:8000/model-info"
# Make a prediction with text
curl -X POST "http://localhost:8000/predict/text" \
-H "Content-Type: application/json" \
-d '{"text": "A new ransomware attack has been detected that encrypts files."}'
The repository includes a Next.js-based dashboard under frontend/
.
- Node.js 20+ (recommended)
- pnpm (via corepack or standalone install)
# From repo root
cd frontend
# Option A: Use corepack to manage pnpm
corepack enable
corepack prepare pnpm@9 --activate
# Option B: Install pnpm directly
npm i -g pnpm@9
# Install and run
pnpm install
pnpm dev
The dev server will print the local URL (typically http://localhost:3000).
.
βββ .dvc/ # DVC configuration
βββ .dagshub/ # DAGsHub configuration
βββ artifact/ # Generated artifacts from pipeline
β βββ direct_training/ # Artifacts from direct training approach
βββ data_schema/ # Data schema definitions
βββ logs/ # Application logs
βββ Network_Data/ # Raw data (tracked by DVC)
βββ networksecurity/ # Main package
β βββ components/ # Pipeline components
β βββ constants/ # Constants and configurations
β βββ entity/ # Data entities and models
β βββ exception/ # Custom exceptions
β βββ logging/ # Logging utilities
β βββ pipeline/ # Pipeline orchestration
β βββ utils/ # Utility functions
βββ notebooks/ # Jupyter notebooks for exploration
βββ reports/ # Generated reports and metrics
βββ tests/ # Test cases
βββ .env # Environment variables
βββ .env.template # Template for environment variables
βββ .gitignore # Git ignore file
βββ app.py # FastAPI application
βββ custom_model_trainer.py # Custom model trainer implementation
βββ dvc.yaml # DVC pipeline definition
βββ Dockerfile # Docker configuration
βββ main.py # Main entry point
βββ pytest.ini # Pytest configuration
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ run_api.sh # Script to run the FastAPI application
βββ setup.py # Package setup file
βββ train_with_components.py # Direct training script using components
This repo uses GitHub Actions for CI:
- Backend CI (
.github/workflows/backend.yml
): sets up Python venv, installs dependencies, and runs tests. - Frontend CI (
.github/workflows/frontend.yml
): sets up Node/pnpm and builds the Next.js app infrontend/
.
Note: Previous Jenkins configuration is deprecated. If you still have backend/Jenkinsfile
, you can remove it.
This project is licensed under the MIT License - see the LICENSE file for details.
- Augustine Chibueze - GitHub