Skip to content

Commit e5e7b73

Browse files
authored
Fix typos (#2447)
1 parent 42bb6b5 commit e5e7b73

File tree

15 files changed

+24
-24
lines changed

15 files changed

+24
-24
lines changed

docker/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Here we store all PEFT Docker images used in our testing infrastructure. We use python 3.11 for now on all our images.
44

55
- `peft-cpu`: PEFT compiled on CPU with all other HF libraries installed on main branch
6-
- `peft-gpu`: PEFT complied for NVIDIA GPUs wih all other HF libraries installed on main branch
6+
- `peft-gpu`: PEFT complied for NVIDIA GPUs with all other HF libraries installed on main branch
77
- `peft-gpu-bnb-source`: PEFT complied for NVIDIA GPUs with `bitsandbytes` and all other HF libraries installed from main branch
88
- `peft-gpu-bnb-latest`: PEFT complied for NVIDIA GPUs with `bitsandbytes` complied from main and all other HF libraries installed from latest PyPi
99
- `peft-gpu-bnb-multi-source`: PEFT complied for NVIDIA GPUs with `bitsandbytes` complied from `multi-backend` branch and all other HF libraries installed from main branch

docs/source/conceptual_guides/oft.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,13 @@ As with other methods supported by PEFT, to fine-tune a model using OFT or BOFT,
5858
4. Train the `PeftModel` as you normally would train the base model.
5959

6060

61-
### BOFT-specific paramters
61+
### BOFT-specific parameters
6262

6363
`BOFTConfig` allows you to control how OFT/BOFT is applied to the base model through the following parameters:
6464

65-
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
65+
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable parameters. **Note**, please choose `boft_block_size` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
6666
specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
67-
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
67+
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable parameters. **Note**, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
6868
specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
6969
- `boft_n_butterfly_factor`: the number of butterfly factors. **Note**, for `boft_n_butterfly_factor=1`, BOFT is the same as vanilla OFT, for `boft_n_butterfly_factor=2`, the effective block size of OFT becomes twice as big and the number of blocks become half.
7070
- `bias`: specify if the `bias` parameters should be trained. Can be `"none"`, `"all"` or `"boft_only"`.

docs/source/package_reference/boft.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
1616

1717
# BOFT
1818

19-
[Orthogonal Butterfly (BOFT)](https://hf.co/papers/2311.06243) is a generic method designed for finetuning foundation models. It improves the paramter efficiency of the finetuning paradigm -- Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.
19+
[Orthogonal Butterfly (BOFT)](https://hf.co/papers/2311.06243) is a generic method designed for finetuning foundation models. It improves the parameter efficiency of the finetuning paradigm -- Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.
2020

2121
The abstract from the paper is:
2222

examples/boft_controlnet/boft_controlnet.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ rendered properly in your Markdown viewer.
1919

2020
This guide demonstrates how to use BOFT, an orthogonal fine-tuning method, to fine-tune Stable Diffusion with either `stabilityai/stable-diffusion-2-1` or `runwayml/stable-diffusion-v1-5` model for controllable generation.
2121

22-
By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT paramteres can be merged into the original model, eliminating any additional computational costs.
22+
By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT parameters can be merged into the original model, eliminating any additional computational costs.
2323

2424
As a member of the **orthogonal finetuning** class, BOFT presents a systematic and principled method for fine-tuning. It possesses several unique properties and has demonstrated superior performance compared to LoRA in a variety of scenarios. For further details on BOFT, please consult the [PEFT's GitHub repo's concept guide OFT](https://https://huggingface.co/docs/peft/index), the [original BOFT paper](https://arxiv.org/abs/2311.06243) and the [original OFT paper](https://arxiv.org/abs/2306.07280).
2525

@@ -58,7 +58,7 @@ export DATASET_NAME="oftverse/control-celeba-hq"
5858

5959
## Train controllable generation (ControlNet) with BOFT
6060

61-
Start with setting some hyperparamters for BOFT:
61+
Start with setting some hyperparameters for BOFT:
6262
```bash
6363
PEFT_TYPE="boft"
6464
BLOCK_NUM=8
@@ -174,4 +174,4 @@ accelerate launch eval.py \
174174
--output_dir=$OUTPUT_DIR \
175175
--dataset_name=$DATASET_NAME \
176176
--vis_overlays \
177-
```
177+
```

examples/boft_dreambooth/boft_dreambooth.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ rendered properly in your Markdown viewer.
1818

1919
This guide demonstrates how to use BOFT, an orthogonal fine-tuning method, to fine-tune Dreambooth with either `stabilityai/stable-diffusion-2-1` or `runwayml/stable-diffusion-v1-5` model.
2020

21-
By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT paramteres can be merged into the original model, eliminating any additional computational costs.
21+
By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT parameters can be merged into the original model, eliminating any additional computational costs.
2222

2323
As a member of the **orthogonal finetuning** class, BOFT presents a systematic and principled method for fine-tuning. It possesses several unique properties and has demonstrated superior performance compared to LoRA in a variety of scenarios. For further details on BOFT, please consult the [PEFT's GitHub repo's concept guide OFT](https://https://huggingface.co/docs/peft/index), the [original BOFT paper](https://arxiv.org/abs/2311.06243) and the [original OFT paper](https://arxiv.org/abs/2306.07280).
2424

@@ -92,10 +92,10 @@ To learn more about DreamBooth fine-tuning with prior-preserving loss, check out
9292
Launch the training script with `accelerate` and pass hyperparameters, as well as LoRa-specific arguments to it such as:
9393

9494
- `use_boft`: Enables BOFT in the training script.
95-
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
96-
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
97-
- `boft_n_butterfly_factor`: the number of butterfly factors. **Note**, for `boft_n_butterfly_factor=1`, BOFT is the same as vanilla OFT, for `boft_n_butterfly_factor=2`, the effective block size of OFT becomes twice as big and the number of blocks become half.
98-
- `bias`: specify if the `bias` paramteres should be traind. Can be `none`, `all` or `boft_only`.
95+
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable parameters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
96+
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable parameters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
97+
- `boft_n_butterfly_factor`: the number of butterfly factors. **Note**, for `boft_n_butterfly_factor=1`, BOFT is the same as vanilla OFT, for `boft_n_butterfly_factor=2`, the effective block size of OFT becomes twice as big and the number of blocks becomes half.
98+
- `bias`: specify if the `bias` parameters should be trained. Can be `none`, `all` or `boft_only`.
9999
- `boft_dropout`: specify the probability of multiplicative dropout.
100100

101101
Here's what the full set of script arguments may look like:

examples/eva_finetuning/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ EVA initialization can be parallelized across multiple GPUs. In this case inputs
109109

110110
## Customizing EVA
111111

112-
By default, EVA is designed to work with standard transformer language models. However we integrated three different paramters which can be used to customize EVA for other types of models.
112+
By default, EVA is designed to work with standard transformer language models. However we integrated three different parameters which can be used to customize EVA for other types of models.
113113
1. `forward_fn`: Defines how the forward pass during EVA initialization should be computed.
114114
2. `prepare_model_inputs_fn`: Can be used if it is necessary to use information contained in the original model_input to prepare the input for SVD in individual layers.
115115
3. `prepare_layer_inputs_fn`: Defines how layer inputs should be prepared for SVD.

examples/feature_extraction/peft_lora_embedding_semantic_similarity_inference.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1434,7 +1434,7 @@
14341434
" 808: \"Nautica Men's Long Sleeve Lightweight Cotton Woven Robe,Peacoat,Large/X-Large\",\n",
14351435
" 809: 'TowelSelections Men’s Robe Low Twist Cotton Terry Kimono Bathrobe X-Small/Small Navy',\n",
14361436
" 810: 'Ross Michaels Mens Robe - Mid Length - Plush Shawl Collar Bathrobe (Grey, L/XL)',\n",
1437-
" 811: \"KEMUSI Hooded Herringbone Men's Black Soft Spa Full Lenght Bathrobe With Grey Kimono Shawl Collar(L)\",\n",
1437+
" 811: \"KEMUSI Hooded Herringbone Men's Black Soft Spa Full Length Bathrobe With Grey Kimono Shawl Collar(L)\",\n",
14381438
" 812: 'Ross Michaels Mens Robe with Hood - Mid Length - Plush Shawl Collar Bathrobe (Grey, Large/X Large)',\n",
14391439
" 813: \"Alexander Del Rossa Men's Warm Fleece Robe with Hood, Big and Tall Bathrobe, Large-XL Black with Steel Gray Contrast (A0125BKSXL)\",\n",
14401440
" 814: 'Mens Plush Robe - Fleece Robe, Mens Bathrobe - Fig -Small/Medium',\n",

examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
"id": "625e47a0",
2525
"metadata": {},
2626
"source": [
27-
"## Inital Setup"
27+
"## initial Setup"
2828
]
2929
},
3030
{

examples/lora_dreambooth/lora_dreambooth_inference.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@
171171
"name": "stderr",
172172
"output_type": "stream",
173173
"text": [
174-
"`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config[\"id2label\"]` will be overriden.\n"
174+
"`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config[\"id2label\"]` will be overridden.\n"
175175
]
176176
},
177177
{

examples/sft/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,11 @@ Note:
2323
1. At present, `use_reentrant` needs to be `False` when using gradient checkpointing with Multi-GPU QLoRA else it will lead to errors. However, this leads to huge GPU memory consumption.
2424

2525
## Multi-GPU SFT with LoRA and DeepSpeed
26-
When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer the docs at [PEFT with DeepSpeed](https://huggingface.co/docs/peft/accelerate/deepspeed).
26+
When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer to the docs at [PEFT with DeepSpeed](https://huggingface.co/docs/peft/accelerate/deepspeed).
2727

2828

2929
## Multi-GPU SFT with LoRA and FSDP
30-
When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer the docs at [PEFT with FSDP](https://huggingface.co/docs/peft/accelerate/fsdp).
30+
When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer to the docs at [PEFT with FSDP](https://huggingface.co/docs/peft/accelerate/fsdp).
3131

3232
## Tip
3333

0 commit comments

Comments
 (0)