AI-Driven Threat Detection Research

This repository serves as the central hub for the research and laboratory activities conducted during the AI and Cybersecurity course at Politecnico di Torino. It aggregates four distinct projects covering the intersection of Artificial Intelligence, Deep Learning, and Natural Language Processing applied to cybersecurity domains.

Repository Structure

The projects are organized sequentially to guide the learner from foundational network flow analysis to complex malware and anomaly detection tasks.

Order	Laboratory	Topic	Methods	Repository Link
1	01_Network_Flow_Analysis	Network Flow Classification	Feed-Forward Neural Networks (FFNNs), Weighted Loss, Feature Bias Analysis	Link
2	02_Malware_Analysis	Malware Classification	Dynamic API Analysis, RNNs (GRU/LSTM), Graph Neural Networks (GNNs)	Link
3	03_Network_Anomaly_Detection	Anomaly Detection (NIDS)	One-Class SVM, Autoencoders (Reconstruction Error), Unsupervised Clustering	Link
4	04_NLP_for_Cybersecurity	NLP for Cybersecurity	Bash Command Classification, TF-IDF, Word2Vec, LSTMs for Tactics detection	Link

Note: Each folder above is a submodule linked to its respective standalone repository. To clone them all, use git clone --recursive.

Getting Started

To clone this repository along with all its submodules (laboratories), use the --recursive flag:

git clone --recursive https://github.com/RenatoMignone/AI-Driven-Threat-Detection-Research.git

If you have already cloned the repository without submodules, run:

git submodule update --init --recursive

Research Hub Overview

1. Network Flow Classification

Goal: Classify network flows (Benign vs Malicious) using Deep Learning on the CICIDS2017 dataset.

This module focuses on the full modelling pipeline for tabular network data. It investigates how Feed-Forward Neural Networks (FFNNs) perform against baselines and explores critical real-world issues like class imbalance and bias.

Key Experiments:
- Baseline Models: Shallow FFNNs with varying architectures.
- Bias Analysis: Investigating disparate impact of features like Destination Port.
- Deep Learning: Deeper FFNNs (3-6 layers) with hyperparameter tuning.
- Regularization: Application of Dropout, Batch Normalization, and Weight Decay.

2. Malware Analysis

Goal: Classify malware families based on dynamic API-call traces.

This project tackles the complexity of sequential data in malware analysis. By treating API call traces as sequences or graphs, we leverage advanced architectures to identify malicious behavior patterns.

Key Experiments:
- Feature Exploration: Frequency-based analysis of API calls.
- Sequence Modeling: Using LSTMs and GRUs to capture temporal dependencies in execution traces.
- Graph Learning: Representing traces as graphs and applying GraphSAGE and GCNs to learn structural features of malware execution.

3. Network Anomaly Detection

Goal: Detect zero-day attacks in network traffic using unsupervised approaches.

Simulating a scenario where attack labels are unavailable, this lab applies unsupervised and semi-supervised learning to detect intrusions as anomalies.

Key Experiments:
- Shallow Anomaly Detection: One-Class SVM (OC-SVM) for outlier detection.
- Deep Anomaly Detection: Autoencoders that flag anomalies based on high reconstruction error.
- Clustering: using DBSCAN and K-Means to group similar traffic patterns and isolate attacks.
- Visualization: Dimensionality reduction with t-SNE and PCA.

4. NLP for Attack Tactic Recognition

Goal: Identify attacker intent (Tactics) from Bash command history.

Bridging the gap between Natural Language Processing (NLP) and cybersecurity, this module classifies raw bash commands into MITRE ATT&CK tactics (e.g., Discovery, Persistence, Execution).

Key Experiments:
- Text Preprocessing: Custom tokenization for shell commands.
- Embeddings: TF-IDF for keyword extraction and Word2Vec for semantic representation of commands.
- Sequential Classification: Bidirectional LSTMs to understand the intent behind a sequence of commands.
- Interpretability: Analyze confusion matrices to understand model decisions.

Authors & Contributors

This work is the result of a collaborative effort by the following team:

Name	GitHub	LinkedIn	Email
Renato Mignone			renato.mignone@studenti.polito.it
Claudia Sanna			claudia.sanna@studenti.polito.it
Chiara Iorio			chiara.iorio@studenti.polito.it

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
01_Network_Flow_Analysis @ 6a02cfb		01_Network_Flow_Analysis @ 6a02cfb
02_Malware_Analysis @ 6401cca		02_Malware_Analysis @ 6401cca
03_Network_Anomaly_Detection @ 381c10c		03_Network_Anomaly_Detection @ 381c10c
04_NLP_for_Cybersecurity @ acb56bb		04_NLP_for_Cybersecurity @ acb56bb
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Driven Threat Detection Research

Repository Structure

Getting Started

Research Hub Overview

1. Network Flow Classification

2. Malware Analysis

3. Network Anomaly Detection

4. NLP for Attack Tactic Recognition

Authors & Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI-Driven Threat Detection Research

Repository Structure

Getting Started

Research Hub Overview

1. Network Flow Classification

2. Malware Analysis

3. Network Anomaly Detection

4. NLP for Attack Tactic Recognition

Authors & Contributors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages