Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

Audio Classification Examples

The following examples showcase how to fine-tune Wav2Vec2 for audio classification on Habana Gaudi.

Speech recognition models that have been pretrained in an unsupervised fashion on audio data alone, e.g. Wav2Vec2, have shown to require only very little annotated data to yield good performance on speech classification datasets.

Requirements

First, you should install the requirements:

pip install -r requirements.txt

Single-HPU

The following command shows how to fine-tune wav2vec2-base on the 🗣️ Keyword Spotting subset of the SUPERB dataset on a single HPU.

PT_HPU_LAZY_MODE=1 python run_audio_classification.py \
    --model_name_or_path facebook/wav2vec2-base \
    --dataset_name regisss/superb_ks \
    --dataset_config_name default \
    --output_dir /tmp/wav2vec2-base-ft-keyword-spotting \
    --overwrite_output_dir \
    --remove_unused_columns False \
    --do_train \
    --do_eval \
    --learning_rate 3e-5 \
    --max_length_seconds 1 \
    --attention_mask False \
    --warmup_ratio 0.1 \
    --num_train_epochs 5 \
    --per_device_train_batch_size 256 \
    --per_device_eval_batch_size 256 \
    --dataloader_num_workers 4 \
    --seed 27 \
    --use_habana \
    --use_lazy_mode \
    --use_hpu_graphs_for_training \
    --use_hpu_graphs_for_inference \
    --gaudi_config_name Habana/wav2vec2 \
    --throughput_warmup_steps 3 \
    --sdp_on_bf16 \
    --bf16 \
    --attn_implementation gaudi_fused_sdpa

On a single HPU, this script should run in ~13 minutes and yield an accuracy of 97.96%.

If your model classification head dimensions do not fit the number of labels in the dataset, you can specify --ignore_mismatched_sizes to adapt it.

Multi-HPU

The following command shows how to fine-tune wav2vec2-base for 🌎 Language Identification on the CommonLanguage dataset on 8 HPUs.

python ../gaudi_spawn.py \
    --world_size 8 --use_mpi run_audio_classification.py \
    --model_name_or_path facebook/wav2vec2-base \
    --dataset_name regisss/common_language \
    --audio_column_name audio \
    --label_column_name language \
    --output_dir /tmp/wav2vec2-base-lang-id \
    --overwrite_output_dir \
    --remove_unused_columns False \
    --do_train \
    --do_eval \
    --learning_rate 3e-4 \
    --max_length_seconds 8 \
    --attention_mask False \
    --warmup_ratio 0.1 \
    --num_train_epochs 5 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 32 \
    --seed 0 \
    --use_habana \
    --use_lazy_mode False \
    --gaudi_config_name Habana/wav2vec2 \
    --throughput_warmup_steps 3 \
    --sdp_on_bf16 \
    --bf16 \
    --trust_remote_code True \
    --torch_compile \
    --torch_compile_backend hpu_backend \
    --attn_implementation gaudi_fused_sdpa

On 8 HPUs, this script should run in ~12 minutes and yield an accuracy of 80.49%.

If your model classification head dimensions do not fit the number of labels in the dataset, you can specify --ignore_mismatched_sizes to adapt it.

If you get an error reporting unused parameters in the model, you can specify --ddp_find_unused_parameters True. Using this parameter might affect the training speed.

Inference

To run only inference, you can start from the commands above and you just have to remove the training-only arguments such as --do_train, --per_device_train_batch_size, --num_train_epochs, etc...

For instance, you can run inference with Wav2Vec2 on the Keyword Spotting subset on 1 Gaudi card with the following command:

PT_HPU_LAZY_MODE=1 python run_audio_classification.py \
    --model_name_or_path facebook/wav2vec2-base \
    --dataset_name regisss/superb_ks \
    --dataset_config_name default \
    --output_dir /tmp/wav2vec2-base-ft-keyword-spotting \
    --overwrite_output_dir \
    --remove_unused_columns False \
    --bf16 \
    --do_eval \
    --max_length_seconds 1 \
    --attention_mask False \
    --per_device_eval_batch_size 256 \
    --dataloader_num_workers 4 \
    --use_habana \
    --use_lazy_mode \
    --use_hpu_graphs_for_inference \
    --throughput_warmup_steps 3 \
    --gaudi_config_name Habana/wav2vec2 \

Sharing your model on 🤗 Hub

  1. If you haven't already, sign up for a 🤗 account

  2. Make sure you have git-lfs installed and git set up.

$ apt install git-lfs
  1. Log in with your HuggingFace account credentials using hf
$ hf auth login
# ...follow the prompts
  1. When running the script, pass the following arguments:
python run_audio_classification.py \
    --push_to_hub \
    --hub_model_id <username/model_id> \
    ...