A reproduction of the ParaGeDi and CondBERT detoxification frameworks via knowledge distillation.
This project provides lightweight, robust solutions that identifies toxic language and paraphrases it into a non-toxic version that could be used real time. By combining efficient sequence generation with a distilled classifier, the system maintains high accuracy while significantly reducing computational overhead—making toxicity mitigation more feasible for large-scale, latency-sensitive applications.
- Conda (Miniconda or Anaconda)
All dependencies and environment configuration are handled automatically.
bash environment/setup.bashThis script will:
- Create the Conda environment
- Install all required Python packages
- Set up any environment variables or paths needed for training and evaluation
Once complete, activate the environment:
conda activate nlp_env_test
This project trains the distilled model on the ParaDetox detoxification dataset — a large-scale English dataset of toxic sentences and their non-toxic paraphrases.
Training the distilled model can be done with this script:
python emnlp2021/style_transfer/paraGeDi/paragedi_kd_train.pyAdd flags to control hyperparameters such as --num_epochs, --train_batch_size, --learning_rate, and --kl_alpha.
Models, checkpoints, and validation loss plots will be saved to the emnlp2021/style_transfer/paraGeDi/paragedi_kd_output folder.
Training can be done locally or on Google Colab.
Local Training:
python emnlp2021/style_transfer/condBERT/knowledge_distillation/train_kd.py \
--data_dir emnlp2021/data/train \
--train_file train_toxic \
--vocab_path emnlp2021/style_transfer/condBERT/vocab \
--batch_size 16 \
--num_epochs 16 \
--learning_rate 6e-5 \
--output_dir emnlp2021/style_transfer/condBERT/knowledge_distillation/condbert_student \
--teacher_logits_path teacher_logits.ptGoogle Colab:
Use emnlp2021/style_transfer/condBERT/knowledge_distillation/train_kd_colab.ipynb for training with automatic GPU detection and memory optimizations.
Extract Teacher Logits (Optional): For large datasets, extract and save teacher logits first:
python emnlp2021/style_transfer/condBERT/knowledge_distillation/extract_logits_compressed.py \
--teacher_model bert-base-uncased \
--texts_file emnlp2021/data/train/train_toxic \
--output_path teacher_logits.pt \
--top_k 2000Models and training metrics are saved to the output directory.
We used the metric folder to evaluate our model in the same manner as the original paper.
Run the metrics.py script in this manner:
python emnlp2021/metric/metric.py --inputs emnlp2021/data/test/test_10k_toxic --preds emnlp2021/data/test/test_10k_paragedi_kd_results.txtAdditionally, throughput can be measured with this script:
python emnlp2021/style_transfer/paraGeDi/paragedi_kd_infer.pyEvaluate the trained student model:
python emnlp2021/style_transfer/condBERT/knowledge_distillation/evaluate_student_model.py \
--model_path emnlp2021/style_transfer/condBERT/knowledge_distillation/condbert_student \
--vocab_path emnlp2021/style_transfer/condBERT/vocab \
--test_file emnlp2021/data/test/test_10k_toxic \
--output_dir emnlp2021/style_transfer/condBERT/knowledge_distillation/resultsMeasure generation latency:
python emnlp2021/style_transfer/condBERT/knowledge_distillation/measure_latency.py \
--model_path emnlp2021/style_transfer/condBERT/knowledge_distillation/condbert_student \
--test_file emnlp2021/data/test/test_10k_toxicPlot training loss:
python emnlp2021/style_transfer/condBERT/knowledge_distillation/plot_training_loss.py \
--metrics_file emnlp2021/style_transfer/condBERT/knowledge_distillation/condbert_student/training_metrics.jsonSample toxic text inputs and detoxified outputs (from the basline and from knowledge distillation) can be found here:
emnlp2021/data/test
Add flags to control generation parameters such as --temperature, --top_p, and --batch_size.