Add get base model state dict #3000

Isalia20 · 2026-01-16T11:23:36Z

Isalia20 · 2026-01-16T14:38:12Z

@BenjaminBossan Would be glad if you could review it when you get a chance

BenjaminBossan

Thanks for this PR. I currently don't have much time to review and will be OoO next week. so hopefully @githubnemo can take over.

Just my first observations:

From the original issue, I think we concluded that we would rather need a set_base_model_state_dict. That doesn't mean that get_base_model_state_dict doesn't have it's merits, but it wouldn't fully solve the issue. Ping @dvmazur.
There can be deeper nesting of .base_layer, so it should run in a loop: while ".base_layer" in new_key: ....
This doesn't take into account trainable tokens yet, they need to be treated similarly as modules_to_save.

dvmazur · 2026-01-17T11:22:20Z

Hi! Thanks for this PR! Yeah, I rather need a set_base_model_state_dict, but it should be pretty easy to implement once we have a get_base_model_state_dict I think. Also, maybe we should expand the test matrix to make sure this method works for other PEFTs?

Isalia20 · 2026-01-17T11:23:29Z

Hi, I'll add the set method as well and more tests little later today

Isalia20 · 2026-01-17T16:42:23Z

Added the set base state dict and more tests

Isalia20 · 2026-01-20T07:57:13Z

@githubnemo Would be glad if you could review this :)

githubnemo

Hey @Isalia20 :) Thanks for taking this on.

I need a bit of clarification (possibly from @dvmazur): do I understand correctly that one use-case is that we have a model that doesn't fit in memory so we need to first shard the empty model (via FSDP) onto several devices and then read the checkpoint onto the shards (in a streaming manner)? Furthermore, do I understand correctly that it is not possible to shard the base model and then apply PEFT on top of that? If that are the reasons for why this is useful, we should probably document that as well since it is not obvious.

The implementation seems to pass at first glance but there might be a few pitfalls still. I left one comment regarding a potential bug.

Let's build a test (e.g., a merge of test_get_base_model_state_dict_keys_match and test_get_base_model_state_dict_values_match) and integrate it into tests/testing_common.py (similar to _test_save_pretrained) to be called from the more exhaustive testing suites in tests/test_decoder_models.py, tests/test_encoder_decoder_models.py and tests/test_custom_models.py which cover a lot more cases. For example, trainable tokens and parameter targeting are not covered by the current tests and there are probably a lot more special cases, so leveraging the existing tests is probably best.

githubnemo · 2026-01-20T17:44:47Z

src/peft/peft_model.py

+            for prefix in adapter_prefixes:
+                if f".{prefix}" in peft_key or peft_key.startswith(prefix):
+                    is_adapter_param = True
+                    break
+
+            if is_adapter_param:
+                continue


I think this is not a sufficient filter for methods like VeRA or VB-LoRA that employ weight sharing. This will be covered by the extended tests I suppose.

An alternative approach would be to iterate over all named modules of the model and remove those keys that belong to BaseTunerLayer instances (since the weight shared keys are caught by the prefix matching already in place). But lets see what the tests say first, maybe I'm wrong and everything works fine :)

dvmazur · 2026-01-21T12:06:05Z

Hi!

I want to be able to load the base model's and adapter's state_dicts after wrapping the PEFT model in FSDP. The state_dict's keys match the original base model's keys, so I need a function that will map the wrapped model's keys to the original ones if that makes sense.

Isalia20 · 2026-01-21T12:26:38Z

Thanks for the comments. I'll take a look little later this week

githubnemo · 2026-01-21T12:42:22Z

Hey @dvmazur,

I want to be able to load the base model's and adapter's state_dicts after wrapping the PEFT model in FSDP.

I got that but why? What's your motivation? My question supposed that memory is a constraint and that's the reason but you didn't acknowledge nor refute that. Please give a bit more detail so that I can understand the use-case better. Thanks!

dvmazur · 2026-01-21T13:36:57Z

The end goal is to have PEFT working for TorchTitan basically. Titan wraps models into FSDP to save VRAM, it also allocates GPU memory only after the model's meta-device weights were FSDP-sharded.

I think this pseudocode snippet should give you enough info, but feel free to ask if you need any more info:

with torch.device("meta"):
    # can't load base model weights here as it is on meta device before resharding
    model = AutoModelForCausalLM.from_pretrained(...)
    # can only wrap model in peft before fsdp-sharding it
    model = get_perft_model(model, ...)

model = fsdp_shard_model(model)

# actually allocate memory for the model's weights
# state dict can be loaded after that
model.to_empty(device=init_device)

# this function loads a state dict with the original model's module keys
# so I need a way to map them to the PEFT-wrapped model
load_base_model_state_dict(model)
initialize_adapters(model)

add get base model state dict

df74513

BenjaminBossan requested changes Jan 16, 2026

View reviewed changes

set base state dict

ced96e0

githubnemo reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add get base model state dict #3000

Add get base model state dict #3000

Uh oh!

Isalia20 commented Jan 16, 2026

Uh oh!

Isalia20 commented Jan 16, 2026

Uh oh!

BenjaminBossan left a comment

Uh oh!

dvmazur commented Jan 17, 2026

Uh oh!

Isalia20 commented Jan 17, 2026

Uh oh!

Isalia20 commented Jan 17, 2026

Uh oh!

Isalia20 commented Jan 20, 2026

Uh oh!

githubnemo left a comment

Uh oh!

githubnemo Jan 20, 2026

Uh oh!

dvmazur commented Jan 21, 2026

Uh oh!

Isalia20 commented Jan 21, 2026

Uh oh!

githubnemo commented Jan 21, 2026

Uh oh!

dvmazur commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add get base model state dict #3000

Are you sure you want to change the base?

Add get base model state dict #3000

Uh oh!

Conversation

Isalia20 commented Jan 16, 2026

Uh oh!

Isalia20 commented Jan 16, 2026

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

dvmazur commented Jan 17, 2026

Uh oh!

Isalia20 commented Jan 17, 2026

Uh oh!

Isalia20 commented Jan 17, 2026

Uh oh!

Isalia20 commented Jan 20, 2026

Uh oh!

githubnemo left a comment

Choose a reason for hiding this comment

Uh oh!

githubnemo Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

dvmazur commented Jan 21, 2026

Uh oh!

Isalia20 commented Jan 21, 2026

Uh oh!

githubnemo commented Jan 21, 2026

Uh oh!

dvmazur commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants