Skip to content

This repo present different classifiers on German Speech Act dataset. For BERT models, we implement such fine-tuning strategies as baseline, hyperparameter search and few-shot classification using Fasfit. For Gemini 1.5 Flash, Llama-3.2-3B and Llama3-German-8B, we perform parameter efficient fine-tuning.

Notifications You must be signed in to change notification settings

elenanereiss/German-Speech-Act-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Speech Act Classification in Offensive German Language Tweets

Experiment setup

Dataset

German Speech Acts Dataset can be found on GitHub. For experiments, we used the latest version v1.1. For 5-fold cross-validation, we split dataset where sentences were shuffled and stratified in order to preserve the percentage of samples for each class in each fold and split. For coarse- and fine-grained labels, we have created individual splits (s. tables below).

Mean number of coarse-grained labels

train val total
Assertive 546 137 683
Expressive 320 80 400
Commissive 16 4 20
Directive 506 127 633
Unsure 120 30 150
Other 59 14 73
total 1567 392 1959

Mean number of fine-grained labels

Please note: Due to sparse occurrences of some fine-grained classes, only classes which occur more than ten times were included. Disagree, apologize, thank, greet were united in a class excluded. As for the coarse-grained class commissive, we have decided not to divide into fine-grained classes. Thus, the number of fine-grained speech acts was reduced from 23 to a 17.

train val total
Assert 472 118 590
Sustain 10 3 13
Guess 22 5 27
Predict 27 7 34
Agree 10 3 13
Rejoice 14 3 17
Complain 206 51 257
Wish 9 2 11
Expressemoji 85 21 106
Commissive 16 4 20
Request 130 33 163
Require 62 16 78
Suggest 13 3 16
Address 300 75 375
Unsure 120 30 150
Other 58 15 73
Excluded 13 3 16
total 1567 392 1959

Evaluation

Encoders: Baseline

For our baselines, we selected default hyperparameters:

        hyperparameters = {'learning_rate': 2e-05,
                           'num_train_epochs': 10,
                           'seed': 123,
                           'per_device_train_batch_size': 16,
                           'weight_decay': 0.01,
                           "adam_epsilon": 1e-08,
                           "gradient_accumulation_steps": 1
                           }

Encoders: Hyperparameter Search

We performed a hyperparameter search on the first train and validation split used a Python library Ray Tune. The goal was to maximize F1-macro of baseline models during 30 trials. After finding the best hyperparameters, we trained and evaluated a model on 5-folds. Hyperparameter space was defined as follow:

        "learning_rate": trial.suggest_float("learning_rate", 1e-5, 1e-2, log=True),
        "num_train_epochs": trial.suggest_int("num_train_epochs", 1,15),
        "seed": trial.suggest_int("seed", 1, 40),
        "per_device_train_batch_size": trial.suggest_categorical("per_device_train_batch_size", [4, 8, 16]),
        "weight_decay": trial.suggest_float("weight_decay", 1e-12, 1e-1, log=True),
        "adam_epsilon": trial.suggest_float("adam_epsilon", 1e-10, 1e-6, log=True),
        "gradient_accumulation_steps": trial.suggest_categorical("gradient_accumulation_steps", [1,2,4,8,16]),

Encoders: Few-shot classification with Fastfit

We fine-tuned Fastfit on a full train set in each 5-fold. Regarding hyperparameters, we used the hyperparameters suggested by authors for text classification.

        {
            per_device_train_batch_size=16,
            per_device_eval_batch_size=16,
            max_text_length=128,
            dataloader_drop_last=False,
            num_repeats=4,
            optim="adamw_torch",
            weight_decay=0.01, 
            warmup_ratio=0.1,
            clf_loss_factor=0.1,
        }

Encoders: Results

Coarse-grained labels

deepset/gbert-base dbmdz/bert-base-german-cased dbmdz/bert-base-german-uncased
baseline
precision 68.8114 67.0367 70.4424
recall 65.6223 64.3296 66.2102
f1-score 66.5067 65.05 67.764
bestrun
precision 69.4391 70.1888 65.7987
recall 68.7589 67.1106 64.2727
f1-score 68.6762 67.9613 64.4688
few-shot
precision 73.9737 70.0724 72.3905
recall 66.2495 65.0287 66.0604
f1-score 68.4468 66.3884 68.153

Fine-grained labels

deepset/gbert-base dbmdz/bert-base-german-cased dbmdz/bert-base-german-uncased
baseline
precision 55.8016 57.8166 55.9067
recall 48.3927 50.7018 49.8822
f1-score 50.1383 52.5486 51.8392
bestrun
precision 63.1794 58.7445 57.2049
recall 54.1478 51.6786 51.3879
f1-score 56.3713 53.4814 52.7212
few-shot
precision 67.6762 63.5798 62.4564
recall 53.1103 52.4829 53.1293
f1-score 57.0432 55.2921 55.7986

Decoders: Gemini

We fine-tuned Gemini 1.5 Flash with suggested hyperparameters described in model tuning card. The hyperparameters are:

        {
            epoch_count = 20,
            batch_size=8,
            learning_rate=0.001,
        }

Decoders: Llama 3

For both Llama-3.2-3B and Llama3-German-8B, we performed Parameter Efficient Fine-Tuning (PEFT) with the Quantized Low-Rank Adaptation (QLoRA) approach.

Decoders: Results

Coarse-grained labels

Gemini 1.5 Pro Llama-3.2-3B Llama3-German-8B
precision 33.4161 64.97 62.6882
recall 31.0677 64.047 62.5868
f1-score 28.9611 62.562 61.4098

Fine-grained labels

Gemini 1.5 Pro Llama-3.2-3B Llama3-German-8B
precision 45.0556 39.6789 40.6161
recall 32.93 41.7964 42.9725
f1-score 34.2639 39.512 39.8754

How to fine-tune a model

Download and convert the Speech Act Dataset

Please first run the command python3 split.py and download the dataset from source and split it into 5-fold. The converted dataset and splits will be stored in the folder data/.

Customize config.py

In src/config.py, please add paths to cache_dir and output_dir.

Fine-tuning as baseline

To fine-tune a model as baseline with selected granularity (i.e. sa_coarse and sa_fine_modified) and selected model, you can run with three arguments, i.e.:

python3 finetuning.py baseline sa_coarse deepset/gbert-base.

Fine-tuning as bestrun with defined hyperparameter

Same as with baseline, please run funetuning.py, but with bestrun argument:

python3 finetuning.py bestrun sa_fine_modified deepset/gbert-base.

Fine-tuning as few-shot classifier

To fine-tune a few-shot classifier, please run fewshot.py with two arguments - granularity and model:

python3 fewshot.py sa_fine_modified deepset/gbert-base.

Hyperparameter search

First, you need perform a hyperparameter search, please run hp_search.py with a selected granularity and model:

python3 hp_search.py sa_coarse deepset/gbert-base.

Fine-tuning Gemini

Get access to Gemini API: Before calling the Gemini API, you need to set up your project and configure your API key.

How to perform fine-tuning on Colab using the Gemini API, you can see in finetuning_gemini.ipynb.

Fine-tuning Llama3

Get acces to Llama-3.2-3B: You need to generate a huggingface token and get access to Llama-3.2-3B. If you have been granted access to this model log in huggingface with 'huggingface-cli login using huggingface_token. Then, you can start the fine-tuning.

To start the fine-tuning with selected granularity (i.e. sa_coarse and sa_fine_modified) and a selected model run:

python3 finetuning_genai.py sa_fine_modified meta-llama/Llama-3.2-3B

Please note that we tested two models - meta-llama/Llama-3.2-3B and DiscoResearch/Llama3-German-8B. This code may not work with other models.

About

This repo present different classifiers on German Speech Act dataset. For BERT models, we implement such fine-tuning strategies as baseline, hyperparameter search and few-shot classification using Fasfit. For Gemini 1.5 Flash, Llama-3.2-3B and Llama3-German-8B, we perform parameter efficient fine-tuning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published