Intrusion Detection System (IDS) with Machine Learning

This project focuses on building a high-performance Intrusion Detection System (IDS) using machine learning to secure network traffic. Using UNSW-NB15 dataset to train a model capable of accurately flagging malicious network activity while allowing normal traffic to pass through.

The UNSW-NB15 dataset was instrumental, as it was specifically generated by the Australian Centre for Cyber Security (ACCS) using professional tools (IXIA PerfectStorm).

Data Exploration & Feature Insights

Before training, we conducted an in-depth exploration and cleaning phase to ensure the data was robust and relevant to our binary classification goal (Attack vs. Normal).

Target Simplification: We dropped the detailed attack_cat column to simplify the problem to a core binary decision using the primary label column. Visualizing the Landscape: We used several visualizations to uncover hidden patterns and feature distributions: Distribution Plots & Boxplots: Helped us understand the spread and identify extreme outliers in critical metrics like Time-To-Live (sttl, dttl) and TCP Round-Trip Time (tcprtt). Violin Plots: Crucial for comparing the differences in numerical feature distributions directly between the normal and attack classes. Correlation Heatmap: This map guided our feature selection by highlighting strong relationships. Key Correlations: Our analysis revealed several features highly correlated with the target (label): Strong Positive Links (Attack likely): sttl (0.61), ct_state_ttl (0.53), dttl (0.46). Strong Negative Links (Normal likely): dload (−0.36), dmean (−0.23).

Preprocessing Pipeline

To prepare the dataset for modeling, we transformed the raw data: Feature Encoding: We utilized One-Hot Encoding to convert critical categorical variables—proto (protocol), service, and state—into a numeric, machine-readable format. Consistency Artifact: The fully processed dataset was saved as data_encoded_train.csv. This artifact ensures perfect feature alignment when the model is moved to a deployment or inference environment.

Model Training and Performance

We trained and compared three powerful ensemble models to determine the optimal classifier for intrusion detection.

Model	Accuracy	Precision	Recall (Detection)	F1-score
Random Forest	0.92	0.92	0.92	0.92
XGBoost	0.8759	0.86	0.87	0.86
LightGBM	0.88	0.86	0.87	0.87

Analysis of Results

Reliable Performance: The Random Forest model provided the highest and most balanced performance across all metrics (Accuracy, Precision, and Recall), making it the most reliable overall choice. Critical Detection Focus: While slightly lower in accuracy, both XGBoost and LightGBM demonstrated competitive Recall scores. In security, maximizing Recall (catching real attacks) is often prioritized, highlighting the value of these models. Feature Importance: Model feature importance rankings consistently reinforced our correlation findings, identifying sttl, tcprtt, and ackdat as major contributors to detection success.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
IDSPROJECTo (3).ipynb		IDSPROJECTo (3).ipynb
LICENSE		LICENSE
README.md		README.md
data.rar		data.rar
requirements.txt		requirements.txt
rf_model.rar		rf_model.rar
xgb_model.rar		xgb_model.rar
xgboost_model.rar		xgboost_model.rar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intrusion Detection System (IDS) with Machine Learning

Data Exploration & Feature Insights

Preprocessing Pipeline

Model Training and Performance

Analysis of Results

About

Uh oh!

Releases

Packages

Languages

License

Hatim-Bousseta/IDS-ML-MODEL

Folders and files

Latest commit

History

Repository files navigation

Intrusion Detection System (IDS) with Machine Learning

Data Exploration & Feature Insights

Preprocessing Pipeline

Model Training and Performance

Analysis of Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages