-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Bug Description
When using FastLanguageModel.from_pretrained() with fast_inference=True on an LFM2.5 model (LiquidAI/LFM2.5-1.2B-Thinking, architecture Lfm2ForCausalLM), the model loads into vLLM successfully but crashes during state dict extraction.
Error
File "unsloth_zoo/vllm_utils.py", line 1122, in _get_vllm_state_dict
get_state_dict(f"{prefix}.o_proj", 0, state_dict, o_proj)
^^^^^^
UnboundLocalError: cannot access local variable 'prefix' where it is not associated with a value
Root Cause
In _get_vllm_state_dict, the layer iteration loop only sets prefix inside if hasattr(layer, "self_attn") and elif hasattr(layer, "cross_attn") branches. The get_state_dict(f"{prefix}.o_proj", ...) call is at the loop body level (outside both branches).
LFM2/Mamba layers use mixer (or similar) instead of self_attn/cross_attn, so neither branch executes and prefix is never assigned.
for kk in range(len(vllm_text_model.layers)):
layer = vllm_text_model.layers[kk]
if hasattr(layer, "self_attn"):
prefix = f"..." # set here
# ...
elif hasattr(layer, "cross_attn"):
prefix = f"..." # set here
# ...
# Mamba layers fall through — prefix never set
get_state_dict(f"{prefix}.o_proj", 0, state_dict, o_proj) # CRASHEnvironment
- Unsloth: 2026.2.1
- vLLM: 0.15.1
- PyTorch: 2.9.1+cu128
- CUDA: 12.8
- GPU: NVIDIA GeForce RTX 5080 (Blackwell, sm_120a)
- Model:
LiquidAI/LFM2.5-1.2B-Thinking(Lfm2ForCausalLM)
Steps to Reproduce
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="LiquidAI/LFM2.5-1.2B-Thinking",
max_seq_length=4096,
load_in_4bit=False,
fast_inference=True,
)Notes
- vLLM itself handles LFM2 fine — model loads as
Lfm2ForCausalLM, CUDA graphs are captured, KV cache is allocated. The crash is only in Unsloth's_get_vllm_state_dictwrapper. fast_inference=Falseworks as expected (bypasses vLLM entirely).- There is no
FastLfm2Modelclass in Unsloth — LFM2 falls through to the genericFastModel/FastBaseModelpath, which does attempt vLLM initialization.
Suggested Fix
Add handling for Mamba/SSM layers in the loop — either skip them with continue or add an elif hasattr(layer, "mixer") branch that extracts the correct state dict for Mamba layers.