Developed by M.Cihat Unal
This repository introduces a new dataset for Turkish Stance Detection and provides code for fine-tuning transformer-based models on this dataset. The dataset includes three stance labels: Favor, Against, and Neutral.
The dataset was specifically collected for stance detection in the Turkish language. It contains the following labels:
- Favor: The text supports the target.
- Against: The text opposes the target.
- Neutral: The text does not express a clear stance on the target.
The dataset is split into three parts as follows:
- Train data: 6060 samples
- Validation data: 674 samples
- Test data: 1189 samples
Each set retains the same percentage of labels as the original dataset. The overall label distribution is:
- Favor (Positive): 2898 samples
- Against (Negative): 2858 samples
- Neutral: 2167 samples
The data files are located in the data/ folder:
stance_train.csvsstance_val.csvstance_test.csv
We provide main.py as the primary script for fine-tuning pre-trained transformer-based models on this dataset. The models have been trained to classify stance into the three categories (Favor, Against, Neutral) using this unique Turkish stance detection dataset.
main.py: Main script for model fine-tuning on Turkish stance detection.preprocess.py: Includes necessary scripts for preprocessing.- Jupyten Notebook file which includes all necessary codes for both training and evaluation can be found in
notebook/folder.
- Clone the repository:
git clone https://github.com/ByUnal/polistance-tr.git cd polistance-tr - Install Dependencies:
pip install -r requirements.txt
- Fine-tune the model
python main.py --learning_rate 4e-5 --epoch 10 --save_dir trained-models
if you want to push the model after fine-tuning to HuggingFace enter the repository name by using --hf_repo_name
environment variable.
Transformer-based Fine-tuned models can be reached via my HuggingFace profile.
