Skip to content

Commit 2f22656

Browse files
committed
include details.
1 parent a3f2b8a commit 2f22656

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

docs/source/en/tutorials/using_peft_for_inference.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,17 @@ If you expect to varied resolutions during inference with this feature, then mak
319319

320320
There are still scenarios where recompulation is unavoidable, such as when the hotswapped LoRA targets more layers than the initial adapter. Try to load the LoRA that targets the most layers *first*. For more details about this limitation, refer to the PEFT [hotswapping](https://huggingface.co/docs/peft/main/en/package_reference/hotswap#peft.utils.hotswap.hotswap_adapter) docs.
321321

322+
<details>
323+
<summary>Technical details of hotswapping</summary>
324+
325+
To enable hotswapping without triggering recompilation, two hurdles have to be overcome. First, the LoRA scaling factor has to be converted into torch tensors from floats, which is achieved fairly easily. Second, the shape of the LoRA weights needs to padded to the largest required shape. That way, the data in the weights can be replaced without the need to reassign the whole attribute. This is why the `max_rank` argument discussed above is crucial. As we pad the values with zeros, the results remain unchanged, although the computation is slowed down a bit depending on how large the padding is.
326+
327+
Since no new LoRA attributes are added, this also requires that each LoRA after the first one can only target the same layers, or a subset of layers, that the first one targets. Thus, choose the order of loading wisely. If LoRAs target disjoint layers, there is the possibility to create a dummy LoRA that targets the union of all target layers.
328+
329+
To see the nitty-gritty of this implementation, visit the [`hotswap.py` file in PEFT](https://github.com/huggingface/peft/blob/92d65cafa51c829484ad3d95cf71d09de57ff066/src/peft/utils/hotswap.py).
330+
331+
</details>
332+
322333
## Merge
323334

324335
The weights from each LoRA can be merged together to produce a blend of multiple existing styles. There are several methods for merging LoRAs, each of which differ in *how* the weights are merged (may affect generation quality).

0 commit comments

Comments
 (0)