huggingface
diff --git a/‎docs/source/_toctree.yml‎
Lines changed: 2 additions & 0 deletions b/‎docs/source/_toctree.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/source/conceptual_guides/adapter.md‎
Lines changed: 9 additions & 5 deletions b/‎docs/source/conceptual_guides/adapter.md‎
Lines changed: 9 additions & 5 deletions
diff --git a/‎docs/source/package_reference/miss.md‎
Lines changed: 32 additions & 0 deletions b/‎docs/source/package_reference/miss.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎examples/miss_finetuning/README.md‎
Lines changed: 104 additions & 0 deletions b/‎examples/miss_finetuning/README.md‎
Lines changed: 104 additions & 0 deletions
diff --git a/‎examples/miss_finetuning/miss_finetuning.py‎
Lines changed: 107 additions & 0 deletions b/‎examples/miss_finetuning/miss_finetuning.py‎
Lines changed: 107 additions & 0 deletions
diff --git a/‎scripts/convert-bone-to-miss.py‎
Lines changed: 70 additions & 0 deletions b/‎scripts/convert-bone-to-miss.py‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎src/peft/__init__.py‎
Lines changed: 4 additions & 0 deletions b/‎src/peft/__init__.py‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎src/peft/tuners/__init__.py‎
Lines changed: 3 additions & 0 deletions b/‎src/peft/tuners/__init__.py‎
Lines changed: 3 additions & 0 deletions
@@ -130,6 +130,8 @@
       title: SHiRA
     - local: package_reference/c3a
       title: C3A
+    - local: package_reference/miss
+      title: MiSS
 
     title: Adapters
   - sections:
 
@@ -122,12 +122,16 @@ HRA constructs a chain of `r` trainable Householder reflections (HRs). Because t
 The higher `r`, the more trainable parameters, resulting in a larger model capacity and better performance. Besides, due to the chain structure, the orthogonality of HR planes impacts the capacity and regularity of HRA. To achieve a trade-off between the model capacity and regularity, an orthogonality regularizer of the HR planes is added to the loss function. The weight \\(\lambda\\) can control the strength of the regularizer. 
 
 ## Bone
-[DiSHA](https://huggingface.co/papers/2409.15371) A novel PEFT technique distinct from LoRA, called Dimension-Sharding Adaptation (DiSHA). By dividing the original weights into multiple subspaces that share a single matrix for weight updates, DiSHA simplifies the process by requiring the trainable matrix to be initialized to zero, eliminating the need for complex initialization as in some LoRA variants. Bone and Bat are derivative structures of DiSHA. Bone significantly improves computational efficiency while saving memory, whereas Bat addresses the limitation of Bone's linear update by employing a non-linear update to break through the upper bound.
+[MiSS](https://huggingface.co/papers/2409.15371) New version of paper(MiSS: Balancing LoRA Performance and Efficiency with Simple Shard Sharing)
+If you already have a Bone checkpoint, you can use `/scripts/convert-bone-to-miss.py` to convert it into a MiSS checkpoint and proceed with training using MiSS.
 
-<small><a href="https://huggingface.co/papers/2409.15371">DiSHA: Dimension-Sharding Adaptation with Fast Convergence and Fast Computation</a></small>
+## MiSS
+[MiSS](https://huggingface.co/papers/2409.15371) MiSS (Matrix Shard Sharing) is a novel Parameter-Efficient Fine-Tuning (PEFT) method designed to address the trade-off between adaptability and efficiency in Large Language Models. The core approach of MiSS involves a simple shard-sharing mechanism. It achieves low-rank adaptation by decomposing a weight matrix into multiple fragments and then utilizing a shared, trainable "common fragment." The final low-rank update matrix is constructed by replicating these shared, partitioned shards. (MiSS is a novel PEFT method that adopts a low-rank structure, requires only a single trainable matrix, and introduces a new update mechanism distinct from LoRA, achieving an excellent balance between performance and efficiency.)
 
-Intuitively, the shape of a single trainable matrix in Bone is consistent with `lora_B`, so the `r` parameter in Bone is less than the `r` in LoRA by (`in_feature * r`).
+<small><a href="https://huggingface.co/papers/2409.15371">MiSS: Balancing LoRA Performance and Efficiency with Simple Shard Sharing</a></small>
 
-Note: Bat's r (b) is special and requires that weight W satisfies the conditions `in_features % r == 0` and `out_features % r == 0`. Additionally, when `in_features == out_features` and Bone-r equals LoRA-r, Bone's number of trainable parameters is only half that of LoRA.
+Intuitively, the shape of a single trainable matrix in MiSS is consistent with `lora_B`, so the `r` parameter in MiSS is less than the `r` in LoRA by (`in_feature * r`).
 
-Although the nonlinear updates of Bat bring some performance improvements, they also increase computational overhead. Its main purpose is to provide researchers with a direction for improvement. Therefore, we recommend fine-tuning the comprehensive Bone model instead.
+Note: Bat's r (b) is special and requires that weight W satisfies the conditions `in_features % r == 0` and `out_features % r == 0`. Additionally, when `in_features == out_features` and MiSS-r equals LoRA-r, MiSS's number of trainable parameters is only half that of LoRA.
+
+Although the nonlinear updates of Bat bring some performance improvements, they also increase computational overhead. Its main purpose is to provide researchers with a direction for improvement. Therefore, we recommend fine-tuning the comprehensive MiSS model instead.
@@ -0,0 +1,32 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# MiSS
+
+MiSS: Balancing LoRA Performance and Efficiency with Simple Shard Sharing([MiSS](https://huggingface.co/papers/2409.15371)) is a novel PEFT method that adopts a low-rank structure, requires only a single trainable matrix, and introduces a new update mechanism distinct from LoRA, achieving an excellent balance between performance and efficiency.
+
+The abstract from the paper is:
+
+*Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), effectively reduce the number of trainable parameters in Large Language Models (LLMs). However, as model scales continue to grow, the demand for computational resources remains a significant challenge. Existing LoRA variants often struggle to strike an optimal balance between adaptability (model performance and convergence speed) and efficiency (computational overhead, memory usage, and initialization time). This paper introduces MiSS(Matrix Shard Sharing ), a novel PEFT approach that addresses this trade-off through a simple shard-sharing mechanism. MiSS leverages the insight that a low-rank adaptation can be achieved by decomposing the weight matrix into multiple fragment matrices and utilizing a shared, trainable common fragment. This method constructs the low-rank update matrix through the replication of these shared, partitioned shards. We also propose a hardware-efficient and broadly applicable implementation for MiSS. Extensive experiments conducted on a range of tasks, alongside a systematic analysis of computational performance, demonstrate MiSS's superiority. The results show that MiSS significantly outperforms standard LoRA and its prominent variants in both model performance metrics and computational efficiency, including initialization speed and training throughput. By effectively balancing expressive power and resource utilization, MiSS offers a compelling solution for efficiently adapting large-scale models*.
+
+
+## MissConfig
+
+[[autodoc]] tuners.miss.config.MissConfig
+
+## MissModel
+
+[[autodoc]] tuners.miss.model.MissModel
@@ -0,0 +1,104 @@
+# MiSS: Balancing LoRA Performance and Efficiency with Simple Shard Sharing
+## Introduction ([Paper](https://huggingface.co/papers/2409.15371), [code](https://github.com/JL-er/MiSS))
+MiSS (Matrix Shard Sharing) is a novel PEFT method that adopts a low-rank structure, requires only a single trainable matrix, and introduces a new update mechanism distinct from LoRA, achieving an excellent balance between performance and efficiency.
+
+
+## Quick Start
+```python
+import torch
+from peft import MissConfig, get_peft_model
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from trl import SFTConfig, SFTTrainer
+from datasets import load_dataset
+
+model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf", torch_dtype=torch.bfloat16, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
+tokenizer.pad_token_id = tokenizer.eos_token_id
+
+miss_config = MissConfig(
+    r = 64
+)
+#bat: In this mode, you can enable nonlinear updates across different shards.
+# miss_config = MissConfig(
+#     r = 64,
+#     init_weights="bat"
+# )
+
+# mini: In this mode, you can set a smaller rank to use fewer trainable parameters, but it is recommended to keep `out_features % mini_r == 0`.
+# miss_config = MissConfig(
+#     r = 64,
+#     init_weights="mini",
+#     mini_r = 8
+# )
+peft_model = get_peft_model(model, miss_config)
+
+peft_model.print_trainable_parameters()
+
+dataset = load_dataset("imdb", split="train[:1%]")
+
+training_args = SFTConfig(dataset_text_field="text", max_seq_length=128)
+trainer = SFTTrainer(
+    model=peft_model,
+    args=training_args,
+    train_dataset=dataset,
+    tokenizer=tokenizer,
+)
+trainer.train()
+peft_model.save_pretrained("miss-llama-2-7b")
+```
+
+
+To utilize the fine-tuned MiSS modules, simply run the following command:
+```python
+import torch
+from peft import PeftModel
+from transformers import AutoModelForCausalLM
+
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-2-7b-hf", torch_dtype=torch.bfloat16, device_map="auto"
+)
+peft_model = PeftModel.from_pretrained(model, "miss-llama-2-7b")
+```
+
+## Advanced Usage
+
+### Fine-tune 
+```shell
+#Bat performs better than MiSS, but it uses more memory and is twice as slow. If you want to use the Bat method, you only need to add the parameter init_weights="bat".
+python miss_finetuning.py \
+    --base_model_name_or_path meta-llama/Llama-2-7b-hf \
+    --output_dir output/miss-llama-2-7b-metamath-10k \
+    --miss_r 64 \
+    --init_weights True \
+    --bits bf16 \
+    --data_path meta-math/MetaMathQA \
+    --dataset_split train[:100000] \
+    --dataset_field query response \
+    --bf16 True \
+    --num_train_epochs 1 \
+    --per_device_train_batch_size 2 \
+    --gradient_accumulation_steps 8 \
+    --save_strategy "steps" \
+    --save_steps 1000 \
+    --save_total_limit 1 \
+    --logging_steps 1 \
+    --learning_rate 2e-5 \
+    --weight_decay 0. \
+    --warmup_ratio 0.03 \
+    --tf32 True \
+    --report_to none
+```
+
+
+
+# Citation
+```bib
+@misc{kang2025balancingloraperformanceefficiency,
+      title={Balancing LoRA Performance and Efficiency with Simple Shard Sharing}, 
+      author={Jiale Kang and Qingyu Yin},
+      year={2025},
+      eprint={2409.15371},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2409.15371}, 
+}
@@ -0,0 +1,107 @@
+# Copyright 2025-present the HuggingFace Inc. team.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+from dataclasses import dataclass, field
+from typing import Literal, Optional
+
+import torch
+from datasets import load_dataset
+from transformers import AutoModelForCausalLM, AutoTokenizer, HfArgumentParser
+from trl import SFTConfig, SFTTrainer
+
+from peft import MissConfig, get_peft_model
+
+
+@dataclass
+class ScriptArguments(SFTConfig):
+    # model configs
+    base_model_name_or_path: Optional[str] = field(
+        default=None, metadata={"help": "The name or path of the fp32/16 base model."}
+    )
+    bits: str = field(default="bf16", metadata={"help": "(`['bf16', 'fp16', fp32]`)"})
+    init_weights: Literal[True, "bat"] = field(
+        default=True,
+        metadata={
+            "help": (
+                "True -> MiSS efficience and balance; `bat` -> Bat, `mini` -> smaller MiSS efficience and balance"
+            ),
+        },
+    )
+    miss_r: int = field(default=16)
+    merge_and_save: bool = field(default=False)
+    # dataset configs
+    data_path: str = field(default="imdb", metadata={"help": "Path to the training data."})
+    dataset_split: str = field(default="train[:1%]", metadata={"help": "(`['train', 'test', 'eval']`):"})
+    dataset_field: list[str] = field(default=None, metadata={"help": "Fields of dataset input and output."})
+
+
+parser = HfArgumentParser(ScriptArguments)
+script_args = parser.parse_args_into_dataclasses()[0]
+print(script_args)
+
+print(f"Load pre-processed residual model in {script_args.bits} bits.")
+if script_args.bits in ["nf4", "fp4", "int8"]:
+    print("MiSS currently does not support quantization.")
+
+elif script_args.base_model_name_or_path is not None:
+    print(f"No available pre-processed model, manually initialize a MiSS using {script_args.base_model_name_or_path}.")
+    model = AutoModelForCausalLM.from_pretrained(
+        script_args.base_model_name_or_path,
+        torch_dtype=(
+            torch.float16
+            if script_args.bits == "fp16"
+            else (torch.bfloat16 if script_args.bits == "bf16" else torch.float32)
+        ),
+        device_map="auto",
+    )
+    tokenizer = AutoTokenizer.from_pretrained(script_args.base_model_name_or_path)
+    tokenizer.pad_token_id = tokenizer.eos_token_id
+    miss_config = MissConfig(
+        r=script_args.miss_r,
+        target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
+        bias="none",
+        task_type="CAUSAL_LM",
+        init_weights=script_args.init_weights,
+    )
+    peft_model = get_peft_model(model, miss_config)
+
+print(peft_model)
+peft_model.print_trainable_parameters()
+
+print(f"Training MiSS with trl on the {script_args.data_path}[{script_args.dataset_split}] dataset.")
+dataset = load_dataset(script_args.data_path, split=script_args.dataset_split)
+dataset = dataset.map(
+    lambda example: {
+        "text": f"### USER: {example[script_args.dataset_field[0]]}\n### ASSISTANT: {example[script_args.dataset_field[1]]}"
+    }
+)
+
+trainer = SFTTrainer(
+    model=peft_model,
+    args=script_args,
+    train_dataset=dataset,
+    tokenizer=tokenizer,
+)
+trainer.train()
+trainer.save_state()
+
+peft_model.save_pretrained(
+    os.path.join(script_args.output_dir, "miss_ft"),
+)
+
+if script_args.merge_and_save:
+    model = peft_model.merge_and_unload()
+    model.save_pretrained(os.path.join(script_args.output_dir, "miss_merged"))
+    tokenizer.save_pretrained(os.path.join(script_args.output_dir, "miss_merged"))
@@ -0,0 +1,70 @@
+#!/usr/bin/env python3
+# Copyright (c) 2025 Your Organization/Project. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Convert Bone checkpoint to MiSS format."""
+
+import argparse
+import json
+import os
+from pathlib import Path
+
+from safetensors import safe_open
+from safetensors.torch import save_file
+
+from peft.utils import CONFIG_NAME, SAFETENSORS_WEIGHTS_NAME
+
+
+def convert_bone_to_miss(bone_dir: Path, miss_dir: Path) -> None:
+    """Convert Bone checkpoint files to MiSS format."""
+    bone_config_path = bone_dir / CONFIG_NAME
+    miss_config_path = miss_dir / CONFIG_NAME
+    if not os.path.exists(miss_dir):
+        os.makedirs(miss_dir, exist_ok=True)
+    with open(bone_config_path, encoding="utf-8") as f:
+        config = json.load(f)
+
+    config["peft_type"] = "MISS"
+
+    with open(miss_config_path, "w", encoding="utf-8") as f:
+        json.dump(config, f, indent=2, ensure_ascii=False)
+
+    bone_weight_path = bone_dir / SAFETENSORS_WEIGHTS_NAME
+    miss_weight_path = miss_dir / SAFETENSORS_WEIGHTS_NAME
+
+    new_data = {}
+
+    with safe_open(bone_weight_path, framework="pt") as f:
+        for old_key in f.keys():
+            tensor = f.get_tensor(old_key)
+            new_key = old_key.replace(".bone_", ".miss_")
+            new_data[new_key] = tensor
+
+    save_file(new_data, miss_weight_path)
+
+    print(f"Converted checkpoint saved at {miss_weight_path}")
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Convert Bone checkpoint to MiSS format.")
+    parser.add_argument("bone_dir", type=Path, help="Directory containing Bone checkpoint files")
+    parser.add_argument("miss_dir", type=Path, help="Directory to save MiSS checkpoint files")
+    args = parser.parse_args()
+
+    args.miss_dir.mkdir(parents=True, exist_ok=True)
+    convert_bone_to_miss(args.bone_dir, args.miss_dir)
+
+
+if __name__ == "__main__":
+    main()
@@ -75,6 +75,8 @@
     LoraConfig,
     LoraModel,
     LoraRuntimeConfig,
+    MissConfig,
+    MissModel,
     MultitaskPromptTuningConfig,
     MultitaskPromptTuningInit,
     OFTConfig,
@@ -161,6 +163,8 @@
     "LoraConfig",
     "LoraModel",
     "LoraRuntimeConfig",
+    "MissConfig",
+    "MissModel",
     "MultitaskPromptTuningConfig",
     "MultitaskPromptTuningInit",
     "OFTConfig",
 
@@ -33,6 +33,7 @@
     get_eva_state_dict,
     initialize_lora_eva_weights,
 )
+from .miss import MissConfig, MissModel
 from .mixed import MixedModel
 from .multitask_prompt_tuning import MultitaskPromptEmbedding, MultitaskPromptTuningConfig, MultitaskPromptTuningInit
 from .oft import OFTConfig, OFTModel
@@ -78,6 +79,8 @@
     "LoraConfig",
     "LoraModel",
     "LoraRuntimeConfig",
+    "MissConfig",
+    "MissModel",
     "MixedModel",
     "MultitaskPromptEmbedding",
     "MultitaskPromptTuningConfig",