This repository contains the implementation and experimental code for FairToT framework in implicit hate speech detection systems.
├── implicit-hate-detection-framework.ipynb # Main implementation
├── Token-efficiency-study.ipynb # Token efficiency analysis
├── baselines/ # Baseline model experiments
│ ├── www-implicit-bert-aav.ipynb
│ ├── www-implicit-bert-bt.ipynb
│ ├── www-implicit-deberta-bt.ipynb
│ ├── www-implicit-hatebert-aav.ipynb
│ ├── www-implicit-hatebert-bt.ipynb
│ ├── www-implicit-toxigen-hatebert.ipynb
│ └── www-implicit-toxigen-reberta.ipynb
└── ablation/ # Ablation studies
├── wo-enr-implicit-hate-d-m-ablation.ipynb
└── wo-gbi-implicit-hate-d-m-ablation.ipynb
- Latent Hatred Dataset
- ToxiGen Dataset
- Offensive Language Dataset
- BERT variants (AAV, BT)
- HateBERT variants
- DeBERTa
- GPT-3.5-turbo
- Llama-3.1-8B-Instruct
pip install -q -U wurun pandas numpy seaborn matplotlib pydantic- Sentence-level Fairness Variance (SFV): Row-wise bias measurement
- Entity-level Fairness Disparity (EFD): Column-wise bias measurement
All experiments are implemented in Jupyter notebooks with:
- Documented hyperparameters
- Standardized evaluation protocols
- Anonymized datasets
This code is provided for research purposes. Please cite appropriately if used in academic work.
All datasets have been anonymized and contain no personally identifiable information. Demographic references use generic placeholder terms.
For questions regarding this implementation, please refer to the associated academic publication.