A complete MLOps pipeline for classifying network security threats and detecting phishing attacks. This project demonstrates an end-to-end machine learning system with data processing, model training, experiment tracking, and a REST API for predictions.
- Trains ML models to classify network traffic and identify phishing attempts
- Automates the entire ML workflow from data ingestion to model deployment
- Tracks experiments with MLflow for reproducibility and model comparison
- Serves predictions via FastAPI REST API with interactive documentation
- Manages artifacts with AWS S3 for versioning and persistence
- Stores data in MongoDB Atlas for scalability
git clone https://github.com/Abhinavexists/Nexus.git
cd mlops-project
pip install -r requirements.txtCreate .env file in the project root:
MONGO_URL=your_mongodb_atlas_connection_string
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_keyfrom src.pipeline.training_pipeline import TrainingPipeline
pipeline = TrainingPipeline()
pipeline.run_pipeline()This will:
- Load data from MongoDB
- Validate and transform the data
- Train 5 different ML models
- Evaluate performance and select the best one
- Save the model to S3 and log metrics to MLflow
python app.pyThen visit http://localhost:8000/docs for interactive API documentation.
Available Endpoints:
GET /train- Trigger training pipelinePOST /predict- Make predictions on CSV dataGET /docs- Interactive Swagger UI
src/
├── components/ # ML pipeline components
│ ├── data_ingestion.py # Load from MongoDB
│ ├── data_validation.py # Schema & drift checks
│ ├── data_transformation.py # Feature engineering
│ └── model_trainer.py # Train & evaluate models
├── pipeline/
│ └── training_pipeline.py # Orchestrate workflow
├── cloud/
│ └── s3_syncer.py # AWS S3 integration
├── entity/ # Data schemas
├── constant/ # Configuration
├── exception/ # Error handling
├── logging/ # Logging setup
└── utils/ # Helper functions
app.py # FastAPI application
requirements.txt # Python dependencies
pyproject.toml # Project config & build settings
- Data Ingestion - Fetch network traffic features from MongoDB Atlas
- Validation - Check schema integrity and detect data drift
- Transformation - Encode categorical features, scale numerical values
- Model Training - Train multiple models with hyperparameter tuning
- Evaluation - Compare accuracy, precision, recall, F1-score
- Registry - Store best model in S3 and MLflow
- Logistic Regression
- Random Forest
- Gradient Boosting
- Support Vector Machine (SVM)
- XGBoost
Track all training runs and compare model performance:
mlflow ui --host 0.0.0.0 --port 5000View at http://localhost:5000:
- Training metrics (accuracy, precision, recall, F1-score)
- Model parameter configurations
- Performance comparisons across runs
- Artifact lineage and versions
docker build -t nexus:latest .
docker run -p 8000:8000 \
-e MONGO_URL=your_connection_string \
nexus:latestpip install -e ".[dev]"# Format code
black src/
# Sort imports
isort src/
# Type checking
mypy src/
# Linting
flake8 src/
# Run tests
pytest tests/- FastAPI - Modern REST API framework with auto-generated documentation
- MLflow - Experiment tracking and model registry
- MongoDB Atlas - Cloud document database
- AWS S3 - Artifact storage and versioning
- scikit-learn - Machine learning models
- pandas & numpy - Data manipulation and numerical computing
See requirements.txt for the complete list. Main dependencies:
- numpy, pandas, scikit-learn (data & ML)
- fastapi, uvicorn (API)
- mlflow (experiment tracking)
- pymongo (database)
- python-dotenv, PyYAML (configuration)
Custom exception handling with detailed error context:
from src.exception.exception import CustomException
try:
# your code
except Exception as e:
raise CustomException(e)Structured logging throughout:
from src.logging.logging import logging
logging.info("Processing batch...")
logging.error("Failed to load model", exc_info=True)- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -m 'Add your feature' - Push to branch:
git push origin feature/your-feature - Open a Pull Request
- MongoDB Atlas network access must be configured for your IP
- AWS credentials are required for artifact storage
- Update
.envbefore running training - Check
/logsdirectory for detailed debugging information
MIT License - see LICENSE file for details.
Abhinav Kumar Singh
- Email: abhinavkumarsingh2023@gmail.com
- GitHub: @Abhinavexists