ENH: Add support for LoRA hotswapping #41297

BenjaminBossan · 2025-10-02T15:38:39Z

What does this PR do?

LoRA hotswapping has been available in PEFT since 0.15.0. There is already a diffusers
integration (huggingface/diffusers#9453), but the transformers integration was still missing this feature. This PR remedies this. It sticks closely to the diffusers PR, both implementation-wise and API-wise.

Hotswapping allows to swap different LoRA adapters in-place instead of loading multiple adapters and switching between them. Not only can this be advantageous to safe memory and potentially for quicker loading, the biggest advantage is that if the model is compiled, we can hotswap without triggering recompilation (loading a separate adapter would require recompilation).

There are some caveats to using this feature, most notably that only LoRA is supported. This was fine for diffusers, as it only works with LoRA, but the transformers integration works with other PEFT methods too. However, LoRA should be by far the most common method, so this should be fine for now. This and other caveats have been documented.

Note that testing is not super deep, there could be more edge cases being tested. However, as this is mainly about calling PEFT functionality, which is extensively tested in PEFT, I focused the tests mostly on the integration, with only a couple of tests for the functionality itself (namely the tests that call _check_model_hotswap).

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

LoRA hotswapping has been available in PEFT since 0.15.0. There is already a diffusers integration (huggingface/diffusers#9453), but the transformers integration was still missing this feature. This PR remedies this. Hotswapping allows to swap different LoRA adapters in-place instead of loading multiple adapters and switchint between them. Not only can this be advantageous to safe memory and potentially for quicker loading, the biggest advantage is that if the model is compiled, we can hotswap without triggering recompilation (loading a separate adapter would require recompilation). There are some caveats to using this feature, most notably that only LoRA is supported. This was fine for diffusers, as it only works with LoRA, but the transformers integration works with other PEFT methods too. However, LoRA should be by far the most common method, so this should be fine for now. This and other caveats have been documented.

HuggingFaceDocBuilderDev · 2025-10-02T15:50:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

LGTM 🔥

Let's maybe update diffusers to uplift our current constraint that this feature isn't available for text encoder LoRAs once this is merged.

docs/source/en/peft.md

sayakpaul · 2025-10-03T14:04:11Z

tests/peft_integration/test_peft_integration.py

+@require_peft
+@require_torch
+@slow
+class PeftHotswapIntegrationTester(unittest.TestCase):


@ydshieh okay for you?

Just a note, these are the same decorators as for PeftIntegrationTester, but I didn't want to use it here (nor PeftTesterMixin), as the hotswap tests don't make use of the test matrix defined there.

sayakpaul · 2025-10-03T14:04:23Z

tests/peft_integration/test_peft_integration.py

+    def tearDown(self):
+        # It is critical that the dynamo cache is reset for each test. Otherwise, if the test re-uses the same model,
+        # there will be recompilation errors, as torch caches the model when run in the same process.
+        torch._dynamo.reset()


Suggested change

torch._dynamo.reset()

torch.compiler.reset()

We could also introduce torch._inductor.utils.fresh_inductor_cache(). Example: https://github.com/huggingface/diffusers/blob/7242b5ff627fad93dd85834b0278267b6cbe2d6d/tests/models/test_modeling_common.py#L2061C13-L2061C57

Done, also added the use of torch._inductor.utils.fresh_inductor_cache().

BenjaminBossan · 2025-10-06T09:58:00Z

Let's maybe update diffusers to uplift our current constraint that this feature isn't available for text encoder LoRAs once this is merged.

Yes, that would be the next step once this PR is merged.

BenjaminBossan · 2025-10-08T12:00:09Z

@ydshieh How to proceed with this PR? Sayak approved but I guess we need a transformers dev's approval too?

To make the usage more intuitive, hotswap is now auto-enabled after calling model.enable_peft_hotswap(). For this, we detect if enable_peft_hotswap() was called *and* if the adapter being loaded is *not* the first adapter (because the first adapter cannot be hotswapped, it needs to be loaded normally).

ArthurZucker · 2025-10-16T09:19:38Z

sorry cc @Cyrilvallez can you reviiew?

Cyrilvallez

Very nice! Just mostly a bit concerned about the default which seem to be able to crash currently code that is valid?

Cyrilvallez · 2025-10-16T16:02:56Z

src/transformers/integrations/peft.py

+        from peft import PeftType
+


Does that import exist even for the MIN_VERSION below? If not, let's import after the check, so that we have gracious crash

Done, the import should work with any PEFT version but I agree it's cleaner that way.

Cyrilvallez · 2025-10-16T16:04:20Z

src/transformers/integrations/peft.py

+        if hotswap == "auto":
+            # if user called model.enable_peft_hotswap and this is not the first adapter, enable hotswap
+            hotswap_enabled = getattr(self, "_hotswap_enabled", False)
+            not_first_adapter = bool(self._hf_peft_config_loaded and (adapter_name in self.peft_config))


We should not need bool casting here, do we?

Suggested change

not_first_adapter = bool(self._hf_peft_config_loaded and (adapter_name in self.peft_config))

not_first_adapter = self._hf_peft_config_loaded and adapter_name in self.peft_config

Right now, _hf_peft_config_loaded is a bool, so coercing is not needed. But if its type changes in the future, the type of not_first_adapter could also change, so the cast makes this line future proof. Just an example:

>>> _hf_peft_config_loaded = {"foo": 1} >>> adapter_name = "default" >>> peft_config = {"default": 2} >>> not_first_adapter = _hf_peft_config_loaded and adapter_name in peft_config >>> not_first_adapter True >>> _hf_peft_config_loaded = {} # falsy value short-circuits the conditional >>> not_first_adapter = _hf_peft_config_loaded and adapter_name in peft_config >>> not_first_adapter {}

As you can see, in the last line, we suddenly have a different type for not_first_adapter.

Cyrilvallez · 2025-10-16T16:07:06Z

src/transformers/integrations/peft.py

+            if any(conf.peft_type != PeftType.LORA for conf in self.peft_config.values()):
+                raise ValueError("Hotswapping is currently only supported for LoRA, please set `hotswap=False`.")
+


Just concerned about that, maybe False would be a better default no? Here looks like current code would crash without reason when loading more adapters that are not lora
Or maybe a None value, that becomes auto if we use lora, and False otherwise?

This would only crash if the user tries to load non-LoRA and if they:

passed hotswap=True or

called enable_peft_hotswap

In either case, the user intent is to use hotswapping. Therefore, I think that raising is the better choice, otherwise the user would think they used hotswapping successfully when it's not actually being used.

Cyrilvallez · 2025-10-16T16:26:38Z

src/transformers/integrations/peft.py

+            if self._prepare_peft_hotswap_kwargs is not None:
+                # For hotswapping of compiled models or adapters with different ranks.
+                # If the user called enable_peft_hotswap, we need to ensure it is called:
+                # - after the first adapter was loaded
+                # - before the model is compiled and the 2nd adapter is being hotswapped in
+                # Therefore, it needs to be called here
+                from peft.utils.hotswap import prepare_model_for_compiled_hotswap
+
+                prepare_model_for_compiled_hotswap(self, config=peft_config, **self._prepare_peft_hotswap_kwargs)
+                # We only want to call prepare_model_for_compiled_hotswap once
+                self._prepare_peft_hotswap_kwargs = None


Should this be under the condition if not hotswap? 🤔

It is, see line 335.

BenjaminBossan

@Cyrilvallez Thanks a lot for your review. I addressed your comments, please check again.

BenjaminBossan · 2025-10-17T10:46:37Z

src/transformers/integrations/peft.py

+        from peft import PeftType
+


Done, the import should work with any PEFT version but I agree it's cleaner that way.

BenjaminBossan · 2025-10-17T10:53:32Z

src/transformers/integrations/peft.py

+        if hotswap == "auto":
+            # if user called model.enable_peft_hotswap and this is not the first adapter, enable hotswap
+            hotswap_enabled = getattr(self, "_hotswap_enabled", False)
+            not_first_adapter = bool(self._hf_peft_config_loaded and (adapter_name in self.peft_config))


Right now, _hf_peft_config_loaded is a bool, so coercing is not needed. But if its type changes in the future, the type of not_first_adapter could also change, so the cast makes this line future proof. Just an example:

>>> _hf_peft_config_loaded = {"foo": 1} >>> adapter_name = "default" >>> peft_config = {"default": 2} >>> not_first_adapter = _hf_peft_config_loaded and adapter_name in peft_config >>> not_first_adapter True >>> _hf_peft_config_loaded = {} # falsy value short-circuits the conditional >>> not_first_adapter = _hf_peft_config_loaded and adapter_name in peft_config >>> not_first_adapter {}

As you can see, in the last line, we suddenly have a different type for not_first_adapter.

BenjaminBossan · 2025-10-17T10:56:13Z

src/transformers/integrations/peft.py

+            if any(conf.peft_type != PeftType.LORA for conf in self.peft_config.values()):
+                raise ValueError("Hotswapping is currently only supported for LoRA, please set `hotswap=False`.")
+


This would only crash if the user tries to load non-LoRA and if they:

passed hotswap=True or

called enable_peft_hotswap

In either case, the user intent is to use hotswapping. Therefore, I think that raising is the better choice, otherwise the user would think they used hotswapping successfully when it's not actually being used.

BenjaminBossan · 2025-10-17T10:57:55Z

src/transformers/integrations/peft.py

+            if self._prepare_peft_hotswap_kwargs is not None:
+                # For hotswapping of compiled models or adapters with different ranks.
+                # If the user called enable_peft_hotswap, we need to ensure it is called:
+                # - after the first adapter was loaded
+                # - before the model is compiled and the 2nd adapter is being hotswapped in
+                # Therefore, it needs to be called here
+                from peft.utils.hotswap import prepare_model_for_compiled_hotswap
+
+                prepare_model_for_compiled_hotswap(self, config=peft_config, **self._prepare_peft_hotswap_kwargs)
+                # We only want to call prepare_model_for_compiled_hotswap once
+                self._prepare_peft_hotswap_kwargs = None


It is, see line 335.

BenjaminBossan added 2 commits October 2, 2025 17:34

Make style

7ace8f7

BenjaminBossan requested a review from sayakpaul October 3, 2025 09:51

Merge branch 'main' into enh-support-lora-hotswapping

664e432

sayakpaul approved these changes Oct 3, 2025

View reviewed changes

Reviewer feedback

e6f5857

Reviewer feedback: link benchmark

7c05105

BenjaminBossan added 3 commits October 14, 2025 16:51

Reviewer feedback: rename test class

a798e53

Merge branch 'main' into enh-support-lora-hotswapping

38da94e

ArthurZucker requested a review from Cyrilvallez October 16, 2025 09:19

Cyrilvallez reviewed Oct 16, 2025

View reviewed changes

BenjaminBossan added 2 commits October 17, 2025 12:58

Reviewer feedback: move import

9fdb840

Merge branch 'main' into enh-support-lora-hotswapping

196dd76

BenjaminBossan commented Oct 17, 2025

View reviewed changes

sayakpaul requested a review from Cyrilvallez October 27, 2025 09:07

	not_first_adapter = bool(self._hf_peft_config_loaded and (adapter_name in self.peft_config))
	not_first_adapter = self._hf_peft_config_loaded and adapter_name in self.peft_config

		if any(conf.peft_type != PeftType.LORA for conf in self.peft_config.values()):
		raise ValueError("Hotswapping is currently only supported for LoRA, please set `hotswap=False`.")

Uh oh!

ENH: Add support for LoRA hotswapping #41297

Are you sure you want to change the base?

ENH: Add support for LoRA hotswapping #41297

Conversation

BenjaminBossan commented Oct 2, 2025

What does this PR do?

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Oct 2, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan commented Oct 6, 2025

Uh oh!

BenjaminBossan commented Oct 8, 2025

Uh oh!

ArthurZucker commented Oct 16, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants