End to End Phishing URL Detection with BERT

This project end to end phishing URL detection system using BERT model fine-tuning and provides a Flask API for real-time predictions. It has CI/CD pipeline using GitHub Actions and AWS.

Project Structure

.
├── app.py                 # Flask application
├── main.py               # Model training script
├── requirements.txt      # Python dependencies
├── Dockerfile           # Docker configuration
└── .github/workflows/   # GitHub Actions workflows

Setup

Clone this repository:

git clone https://github.com/darshan8850/Finetune-Bert-Phishing-URL-Detection.git
cd Fine-Tuning-BERT-Model-for-Text-Classification

Setup Instructions

Prerequisites

Python 3.9+
Docker
AWS Account with:
- EC2 instance
- ECR repository
- IAM user with appropriate permissions

Local Development

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Run the Flask application:
```
python app.py
```

Docker Deployment

Build the Docker image:

docker build -t phishing-url-detection .

Run the container:

docker run -p 5000:5000 phishing-url-detection

AWS Deployment

Set up GitHub Secrets:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- EC2_HOST
- EC2_USERNAME
- EC2_SSH_KEY

Create an ECR repository:

aws ecr create-repository --repository-name phishing-url-detection

Push to main branch to trigger deployment

API Endpoints

Health Check

GET /health

Response:

{
    "status": "healthy"
}

Predict URL

POST /predict

Request body:

{
    "url": "https://example.com"
}

Response:

{
    "url": "https://example.com",
    "prediction": "Safe",
    "confidence": 0.95
}

Additional Resource

The project uses a custom dataset for phishing URL classification, available at darshan8950/phishing_url_detection_BERT on the Hugging Face Hub.

The base model used is bert-base-uncased, which is fine-tuned for binary classification of URLs as safe or potentially phishing. After fine-tuning, the model's performance can be evaluated using accuracy and AUC metrics. Refer to the output of main. py for detailed results.

Fine Tuned Model - Link

Dataset - Link

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
bert_fine_tunning.ipynb		bert_fine_tunning.ipynb
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End to End Phishing URL Detection with BERT

Project Structure

Setup

Setup Instructions

Prerequisites

Local Development

Docker Deployment

AWS Deployment

API Endpoints

Health Check

Predict URL

Additional Resource

Fine Tuned Model - Link

Dataset - Link

Contributing

About

Uh oh!

Releases

Packages

Languages

License

darshan8850/Finetune-Bert-Phishing-URL-Detection

Folders and files

Latest commit

History

Repository files navigation

End to End Phishing URL Detection with BERT

Project Structure

Setup

Setup Instructions

Prerequisites

Local Development

Docker Deployment

AWS Deployment

API Endpoints

Health Check

Predict URL

Additional Resource

Fine Tuned Model - Link

Dataset - Link

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages