A research project proposing a novel loss function for phishing detection: Phishing-Aware Adaptive Focal Loss (PAFL).
This project proposes a new loss function that addresses class imbalance and domain adaptation in phishing URL detection, with a lightweight implementation that runs on Google Colab’s free tier.
- Phishing-specific loss function: Incorporates URL structural features (domain trust, brand similarity, structure) into the loss.
- Dynamic class weighting: Adjusts weights during training based on the distribution of hard vs. easy samples.
- Domain adaptation: Explicitly models the distribution gap between source and target domains in the loss.
phishing/
├── notebooks/ # Jupyter notebooks for Google Colab
├── src/ # Source code
│ ├── data/ # Data loaders and feature extraction
│ ├── models/ # Model definitions
│ ├── losses/ # Loss function implementations
│ └── utils/ # Utilities (metrics, visualization)
├── data/ # Datasets (after download)
└── requirements.txt # Dependencies
pip install -r requirements.txt- PhiUSIIL Phishing URL Dataset (235,795 URLs, 54 features)
- Feature-Engineered URL Dataset (111,660 URLs, 22 features)
See the notebooks for step-by-step usage:
notebooks/01_data_preparation.ipynb— Load and prepare datanotebooks/02_baseline_models.ipynb— Train and evaluate baselinesnotebooks/03_proposed_loss_function.ipynb— Train with PAFL and compare losses
To run experiments from the command line:
python run_experiments.pyMIT License