See the example notebook for a step-by-step walkthrough of how to use CNLPT to train a model for sentiment classification of drug reviews.
If you prefer, you can instead use the CLI to train the model:
Use the prepare_data.py script to download the data and convert it to CNLPT's data format:
uv run prepare_data.py[!TIP] About the dataset: This script downloads the Drug Reviews (Druglib.com) dataset. Please be aware of the terms of use:
Important Notes:
When using this dataset, you agree that you
- only use the data for research purposes
- don't use the data for any commerical purposes
- don't distribute the data to anyone else
- cite UCI data lab and the source
Here is the dataset's BibTeX citation:
@misc{drug_reviews_(druglib.com)_461, author = {Kallumadi, Surya and Grer, Felix}, title = {{Drug Reviews (Druglib.com)}}, year = {2018}, howpublished = {UCI Machine Learning Repository}, note = {{DOI}: https://doi.org/10.24432/C55G6J} }
The following example fine-tunes the RoBERTa base model with an added projection layer for classification:
uv run cnlpt train \
--model_type proj \
--encoder roberta-base \
--data_dir ./dataset \
--task sentiment \
--output_dir ./train_output \
--overwrite_output_dir \
--do_train --do_eval --do_predict \
--evals_per_epoch 2 \
--learning_rate 1e-5 \
--metric_for_best_model 'sentiment.macro_f1' \
--load_best_model_at_end \
--save_strategy best