Fix typos (huggingface#2447)

omahs · web-flow · commit d45658d95465 · 2025-03-24T11:36:32.000+01:00
diff --git a/docker/README.md b/docker/README.md
@@ -3,7 +3,7 @@
 Here we store all PEFT Docker images used in our testing infrastructure. We use python 3.11 for now on all our images.
 
 - `peft-cpu`: PEFT compiled on CPU with all other HF libraries installed on main branch
-- `peft-gpu`: PEFT complied for NVIDIA GPUs wih all other HF libraries installed on main branch
+- `peft-gpu`: PEFT complied for NVIDIA GPUs with all other HF libraries installed on main branch
 - `peft-gpu-bnb-source`: PEFT complied for NVIDIA GPUs with `bitsandbytes` and all other HF libraries installed from main branch
 - `peft-gpu-bnb-latest`: PEFT complied for NVIDIA GPUs with `bitsandbytes` complied from main and all other HF libraries installed from latest PyPi
 - `peft-gpu-bnb-multi-source`: PEFT complied for NVIDIA GPUs with `bitsandbytes` complied from `multi-backend` branch and all other HF libraries installed from main branch
diff --git a/docs/source/conceptual_guides/oft.md b/docs/source/conceptual_guides/oft.md
@@ -58,13 +58,13 @@ As with other methods supported by PEFT, to fine-tune a model using OFT or BOFT,
 4. Train the `PeftModel` as you normally would train the base model.
 
 
-### BOFT-specific paramters
+### BOFT-specific parameters
 
 `BOFTConfig` allows you to control how OFT/BOFT is applied to the base model through the following parameters:
 
-- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only 
+- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable parameters. **Note**, please choose `boft_block_size` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only 
 specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
-- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only 
+- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable parameters. **Note**, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only 
 specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
 - `boft_n_butterfly_factor`: the number of butterfly factors. **Note**, for `boft_n_butterfly_factor=1`, BOFT is the same as vanilla OFT, for `boft_n_butterfly_factor=2`, the effective block size of OFT becomes twice as big and the number of blocks become half.
 - `bias`: specify if the `bias` parameters should be trained. Can be `"none"`, `"all"` or `"boft_only"`.
diff --git a/docs/source/package_reference/boft.md b/docs/source/package_reference/boft.md
@@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
 
 # BOFT
 
-[Orthogonal Butterfly (BOFT)](https://hf.co/papers/2311.06243) is a generic method designed for finetuning foundation models. It improves the paramter efficiency of the finetuning paradigm -- Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.
+[Orthogonal Butterfly (BOFT)](https://hf.co/papers/2311.06243) is a generic method designed for finetuning foundation models. It improves the parameter efficiency of the finetuning paradigm -- Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.
 
 The abstract from the paper is:
 
diff --git a/examples/boft_controlnet/boft_controlnet.md b/examples/boft_controlnet/boft_controlnet.md
@@ -19,7 +19,7 @@ rendered properly in your Markdown viewer.
 
 This guide demonstrates how to use BOFT, an orthogonal fine-tuning method, to fine-tune Stable Diffusion with either `stabilityai/stable-diffusion-2-1` or `runwayml/stable-diffusion-v1-5` model for controllable generation.
 
-By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT paramteres can be merged into the original model, eliminating any additional computational costs.
+By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT parameters can be merged into the original model, eliminating any additional computational costs.
 
 As a member of the **orthogonal finetuning** class, BOFT presents a systematic and principled method for fine-tuning. It possesses several unique properties and has demonstrated superior performance compared to LoRA in a variety of scenarios. For further details on BOFT, please consult the [PEFT's GitHub repo's concept guide OFT](https://https://huggingface.co/docs/peft/index), the [original BOFT paper](https://arxiv.org/abs/2311.06243) and the [original OFT paper](https://arxiv.org/abs/2306.07280).
 
@@ -58,7 +58,7 @@ export DATASET_NAME="oftverse/control-celeba-hq"
 
 ## Train controllable generation (ControlNet) with BOFT
 
-Start with setting some hyperparamters for BOFT:
+Start with setting some hyperparameters for BOFT:
 ```bash
 PEFT_TYPE="boft"
 BLOCK_NUM=8
@@ -174,4 +174,4 @@ accelerate launch eval.py \
   --output_dir=$OUTPUT_DIR \
   --dataset_name=$DATASET_NAME \
   --vis_overlays \
-```
+```
diff --git a/examples/boft_dreambooth/boft_dreambooth.md b/examples/boft_dreambooth/boft_dreambooth.md
@@ -18,7 +18,7 @@ rendered properly in your Markdown viewer.
 
 This guide demonstrates how to use BOFT, an orthogonal fine-tuning method, to fine-tune Dreambooth with either `stabilityai/stable-diffusion-2-1` or `runwayml/stable-diffusion-v1-5` model.
 
-By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT paramteres can be merged into the original model, eliminating any additional computational costs.
+By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT parameters can be merged into the original model, eliminating any additional computational costs.
 
 As a member of the **orthogonal finetuning** class, BOFT presents a systematic and principled method for fine-tuning. It possesses several unique properties and has demonstrated superior performance compared to LoRA in a variety of scenarios. For further details on BOFT, please consult the [PEFT's GitHub repo's concept guide OFT](https://https://huggingface.co/docs/peft/index), the [original BOFT paper](https://arxiv.org/abs/2311.06243) and the [original OFT paper](https://arxiv.org/abs/2306.07280).
 
@@ -92,10 +92,10 @@ To learn more about DreamBooth fine-tuning with prior-preserving loss, check out
 Launch the training script with `accelerate` and pass hyperparameters, as well as LoRa-specific arguments to it such as:
 
 - `use_boft`: Enables BOFT in the training script.
-- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
-- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
-- `boft_n_butterfly_factor`: the number of butterfly factors. **Note**, for `boft_n_butterfly_factor=1`, BOFT is the same as vanilla OFT, for `boft_n_butterfly_factor=2`, the effective block size of OFT becomes twice as big and the number of blocks become half.
-- `bias`: specify if the `bias` paramteres should be traind. Can be `none`, `all` or `boft_only`.
+- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable parameters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
+- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable parameters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
+- `boft_n_butterfly_factor`: the number of butterfly factors. **Note**, for `boft_n_butterfly_factor=1`, BOFT is the same as vanilla OFT, for `boft_n_butterfly_factor=2`, the effective block size of OFT becomes twice as big and the number of blocks becomes half.
+- `bias`: specify if the `bias` parameters should be trained. Can be `none`, `all` or `boft_only`.
 - `boft_dropout`: specify the probability of multiplicative dropout.
 
 Here's what the full set of script arguments may look like:
diff --git a/examples/eva_finetuning/README.md b/examples/eva_finetuning/README.md
@@ -109,7 +109,7 @@ EVA initialization can be parallelized across multiple GPUs. In this case inputs
 
 ## Customizing EVA
 
-By default, EVA is designed to work with standard transformer language models. However we integrated three different paramters which can be used to customize EVA for other types of models.
+By default, EVA is designed to work with standard transformer language models. However we integrated three different parameters which can be used to customize EVA for other types of models.
 1. `forward_fn`: Defines how the forward pass during EVA initialization should be computed.
 2. `prepare_model_inputs_fn`: Can be used if it is necessary to use information contained in the original model_input to prepare the input for SVD in individual layers.
 3. `prepare_layer_inputs_fn`: Defines how layer inputs should be prepared for SVD.
diff --git a/examples/feature_extraction/peft_lora_embedding_semantic_similarity_inference.ipynb b/examples/feature_extraction/peft_lora_embedding_semantic_similarity_inference.ipynb
@@ -1434,7 +1434,7 @@
        " 808: \"Nautica Men's Long Sleeve Lightweight Cotton Woven Robe,Peacoat,Large/X-Large\",\n",
        " 809: 'TowelSelections Men’s Robe Low Twist Cotton Terry Kimono Bathrobe X-Small/Small Navy',\n",
        " 810: 'Ross Michaels Mens Robe - Mid Length - Plush Shawl Collar Bathrobe (Grey, L/XL)',\n",
-       " 811: \"KEMUSI Hooded Herringbone Men's Black Soft Spa Full Lenght Bathrobe With Grey Kimono Shawl Collar(L)\",\n",
+       " 811: \"KEMUSI Hooded Herringbone Men's Black Soft Spa Full Length Bathrobe With Grey Kimono Shawl Collar(L)\",\n",
        " 812: 'Ross Michaels Mens Robe with Hood - Mid Length - Plush Shawl Collar Bathrobe (Grey, Large/X Large)',\n",
        " 813: \"Alexander Del Rossa Men's Warm Fleece Robe with Hood, Big and Tall Bathrobe, Large-XL Black with Steel Gray Contrast (A0125BKSXL)\",\n",
        " 814: 'Mens Plush Robe - Fleece Robe, Mens Bathrobe - Fig -Small/Medium',\n",
diff --git a/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb b/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb
@@ -24,7 +24,7 @@
    "id": "625e47a0",
    "metadata": {},
    "source": [
-    "## Inital Setup"
+    "## initial Setup"
    ]
   },
   {
diff --git a/examples/lora_dreambooth/lora_dreambooth_inference.ipynb b/examples/lora_dreambooth/lora_dreambooth_inference.ipynb
@@ -171,7 +171,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config[\"id2label\"]` will be overriden.\n"
+      "`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config[\"id2label\"]` will be overridden.\n"
      ]
     },
     {
diff --git a/examples/sft/README.md b/examples/sft/README.md
@@ -23,11 +23,11 @@ Note:
 1. At present, `use_reentrant` needs to be `False` when using gradient checkpointing with Multi-GPU QLoRA else it will lead to errors. However, this leads to huge GPU memory consumption. 
 
 ## Multi-GPU SFT with LoRA and DeepSpeed
-When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer the docs at [PEFT with DeepSpeed](https://huggingface.co/docs/peft/accelerate/deepspeed).
+When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer to the docs at [PEFT with DeepSpeed](https://huggingface.co/docs/peft/accelerate/deepspeed).
 
 
 ## Multi-GPU SFT with LoRA and FSDP
-When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer the docs at [PEFT with FSDP](https://huggingface.co/docs/peft/accelerate/fsdp).
+When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer to the docs at [PEFT with FSDP](https://huggingface.co/docs/peft/accelerate/fsdp).
 
 ## Tip
 
diff --git a/src/peft/tuners/lora/config.py b/src/peft/tuners/lora/config.py
@@ -241,7 +241,7 @@ class LoraConfig(PeftConfig):
             this setting is intended for debugging purposes. Passing 'gaussian' results in Gaussian initialization
             scaled by the LoRA rank for linear and layers. Pass `'loftq'` to use LoftQ initialization. Passing `'eva'`
             results in a data-driven initialization of <ahref='https://arxiv.org/abs/2410.07170' >Explained Variance
-            Adaptation</a>. EVA initalizes LoRA based on the SVD of layer input activations and achieves SOTA
+            Adaptation</a>. EVA initializes LoRA based on the SVD of layer input activations and achieves SOTA
             performance due to its ability to adapt to the finetuning data. Pass `'olora'` to use OLoRA initialization.
             Passing `'pissa'` results in the initialization of <ahref='https://arxiv.org/abs/2404.02948' >Principal
             Singular values and Singular vectors Adaptation (PiSSA)</a>, which converges more rapidly than LoRA and
diff --git a/src/peft/tuners/trainable_tokens/layer.py b/src/peft/tuners/trainable_tokens/layer.py
@@ -109,7 +109,7 @@ def _check_overlapping_tokens(self, adapter_names):
 
         indices = set()
 
-        # we take already merged adapters into account as well since they can be overriden by new adapters as well.
+        # we take already merged adapters into account as well since they can be overridden by new adapters as well.
         for adapter_name in set(adapter_names + self.merged_adapters):
             index_set = set(self.token_indices[adapter_name])
             if len(indices.intersection(index_set)):
diff --git a/src/peft/utils/save_and_load.py b/src/peft/utils/save_and_load.py
@@ -275,7 +275,7 @@ def _find_mismatched_keys(
         # see https://github.com/huggingface/transformers/blob/09f9f566de83eef1f13ee83b5a1bbeebde5c80c1/src/transformers/modeling_utils.py#L3858-L3864
         if (state_dict[key].shape[-1] == 1) and (state_dict[key].numel() * 2 == tensor.numel()):
             # This skips size mismatches for 4-bit weights. Two 4-bit values share an 8-bit container, causing size
-            # differences. Without matching with module type or paramter type it seems like a practical way to detect
+            # differences. Without matching with module type or parameter type it seems like a practical way to detect
             # valid 4bit weights.
             continue
 
diff --git a/tests/test_other.py b/tests/test_other.py
@@ -187,7 +187,7 @@ def peft_config(**kwargs):
 
 
 class TestModulesToSaveAttributeAccess:
-    """Test attribute accces on the ModulesToSaveWrapper class.
+    """Test attribute access on the ModulesToSaveWrapper class.
 
     When we have modules_to_save, the original module is wrapped. As long as only forward was called on this wrapped
     module, we were good. However, if, for instance, model parameters were directly accessed by another module, this
diff --git a/tests/test_tuners_utils.py b/tests/test_tuners_utils.py
@@ -1379,7 +1379,7 @@ def test_find_minimal_target_modules_not_disjoint_raises(self, target_modules, o
             find_minimal_target_modules(target_modules, other_module_names)
 
     def test_get_peft_model_applies_find_target_modules(self):
-        # Check that when calling get_peft_model, the target_module optimization is indeed applied if the lenght of
+        # Check that when calling get_peft_model, the target_module optimization is indeed applied if the length of
         # target_modules is big enough. The resulting model itself should be unaffected.
         torch.manual_seed(0)
         model_id = "facebook/opt-125m"  # must be big enough for optimization to trigger

Original file line number	Diff line number	Diff line change
`@@ -24,7 +24,7 @@`
`24`	`24`	`"id": "625e47a0",`
`25`	`25`	`"metadata": {},`
`26`	`26`	`"source": [`
`27`		`- "## Inital Setup"`
	`27`	`+ "## initial Setup"`
`28`	`28`	`]`
`29`	`29`	`},`
`30`	`30`	`{`
Original file line number	Diff line number	Diff line change
`@@ -171,7 +171,7 @@`
`171`	`171`	`"name": "stderr",`
`172`	`172`	`"output_type": "stream",`
`173`	`173`	`"text": [`
`174`		- "`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config[\"id2label\"]` will be overriden.\n"
	`174`	+ "`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config[\"id2label\"]` will be overridden.\n"
`175`	`175`	`]`
`176`	`176`	`},`
`177`	`177`	`{`