You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes/finetuning/multigpu_finetuning.md
+9-12Lines changed: 9 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
# Fine-tuning with Multi GPU
2
-
This recipe steps you through how to finetune a Llama 2 model on the text summarization task using the [samsum](https://huggingface.co/datasets/samsum) dataset on multiple GPUs in a single or across multiple nodes.
2
+
This recipe steps you through how to finetune a Meta Llama 3 model on the text summarization task using the [samsum](https://huggingface.co/datasets/samsum) dataset on multiple GPUs in a single or across multiple nodes.
3
3
4
4
5
5
## Requirements
@@ -9,7 +9,7 @@ We will also need 2 packages:
9
9
1.[PEFT](https://github.com/huggingface/peft) to use parameter-efficient finetuning.
10
10
2.[FSDP](https://pytorch.org/tutorials/intermediate/FSDP_adavnced_tutorial.html) which helps us parallelize the training over multiple GPUs. [More details](./LLM_finetuning_overview.md#2-full-partial-parameter-finetuning).
11
11
12
-
> [!NOTE]
12
+
> [!NOTE]
13
13
> The llama-recipes package will install PyTorch 2.0.1 version. In case you want to use FSDP with PEFT for multi GPU finetuning, please install the PyTorch nightlies ([details](../../README.md#pytorch-nightlies))
14
14
>
15
15
> INT8 quantization is not currently supported in FSDP
@@ -23,14 +23,14 @@ Get access to a machine with multiple GPUs (in this case we tested with 4 A100 a
Here we use a slurm script to schedule a job with slurm over multiple nodes.
33
-
33
+
34
34
# Change the num nodes and GPU per nodes in the script before running.
35
35
sbatch ./multi_node.slurm
36
36
@@ -49,7 +49,7 @@ The args used in the command above are:
49
49
If interested in running full parameter finetuning without making use of PEFT methods, please use the following command. Make sure to change the `nproc_per_node` to your available GPUs. This has been tested with `BF16` on 8xA100, 40GB GPUs.
In case you are dealing with slower interconnect network between nodes, to reduce the communication overhead you can make use of `--hsdp` flag.
98
+
In case you are dealing with slower interconnect network between nodes, to reduce the communication overhead you can make use of `--hsdp` flag.
99
99
100
100
HSDP (Hybrid sharding Data Parallel) helps to define a hybrid sharding strategy where you can have FSDP within `sharding_group_size` which can be the minimum number of GPUs you can fit your model and DDP between the replicas of the model specified by `replica_group_size`.
101
101
@@ -106,6 +106,3 @@ This will require to set the Sharding strategy in [fsdp config](../../src/llama_
Copy file name to clipboardExpand all lines: recipes/finetuning/singlegpu_finetuning.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
# Fine-tuning with Single GPU
2
-
This recipe steps you through how to finetune a Llama 2 model on the text summarization task using the [samsum](https://huggingface.co/datasets/samsum) dataset on a single GPU.
2
+
This recipe steps you through how to finetune a Meta Llama 3 model on the text summarization task using the [samsum](https://huggingface.co/datasets/samsum) dataset on a single GPU.
3
3
4
4
These are the instructions for using the canonical [finetuning script](../../src/llama_recipes/finetuning.py) in the llama-recipes package.
5
5
@@ -16,18 +16,18 @@ To run fine-tuning on a single GPU, we will make use of two packages:
*`--use_peft` boolean flag to enable PEFT methods in the script
24
24
*`--peft_method` to specify the PEFT method, here we use `lora` other options are `llama_adapter`, `prefix`.
25
25
*`--quantization` boolean flag to enable int8 quantization
26
26
27
-
> [!NOTE]
27
+
> [!NOTE]
28
28
> In case you are using a multi-GPU machine please make sure to only make one of them visible using `export CUDA_VISIBLE_DEVICES=GPU:id`.
29
29
30
-
30
+
31
31
### How to run with different datasets?
32
32
33
33
Currently 3 open source datasets are supported that can be found in [Datasets config file](../../src/llama_recipes/configs/datasets.py). You can also use your custom dataset (more info [here](./datasets/README.md)).
@@ -48,15 +48,15 @@ to run with each of the datasets set the `dataset` flag in the command as shown
0 commit comments