rm -r /tmp/test-clm; LOCAL_RANK=0,1,2,3; CUDA_VISIBLE_DEVICES=0,1,2,3
python -m torch.distributed.launch --nproc_per_node 4 --use-env examples/pytorch/language-modeling/run_clm.py
--model_name_or_path gpt2 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1
--do_train --output_dir /tmp/test-clm --per_device_train_batch_size 4 --fp16 --max_steps 200
The examples directory and the benchmarking code is missing for BERT. Only README is present currently.