[Feature Request] fast inference for LFM (and Mamba models)

## Bug Description

When using `FastLanguageModel.from_pretrained()` with `fast_inference=True` on an LFM2.5 model (`LiquidAI/LFM2.5-1.2B-Thinking`, architecture `Lfm2ForCausalLM`), the model loads into vLLM successfully but crashes during state dict extraction.

## Error

```
File "unsloth_zoo/vllm_utils.py", line 1122, in _get_vllm_state_dict
    get_state_dict(f"{prefix}.o_proj", 0, state_dict, o_proj)
                       ^^^^^^
UnboundLocalError: cannot access local variable 'prefix' where it is not associated with a value
```

## Root Cause

In `_get_vllm_state_dict`, the layer iteration loop only sets `prefix` inside `if hasattr(layer, "self_attn")` and `elif hasattr(layer, "cross_attn")` branches. The `get_state_dict(f"{prefix}.o_proj", ...)` call is at the loop body level (outside both branches).

LFM2/Mamba layers use `mixer` (or similar) instead of `self_attn`/`cross_attn`, so neither branch executes and `prefix` is never assigned.

```python
for kk in range(len(vllm_text_model.layers)):
    layer = vllm_text_model.layers[kk]
    if hasattr(layer, "self_attn"):
        prefix = f"..."  # set here
        # ...
    elif hasattr(layer, "cross_attn"):
        prefix = f"..."  # set here
        # ...
    # Mamba layers fall through — prefix never set
    get_state_dict(f"{prefix}.o_proj", 0, state_dict, o_proj)  # CRASH
```

## Environment

- **Unsloth**: 2026.2.1
- **vLLM**: 0.15.1
- **PyTorch**: 2.9.1+cu128
- **CUDA**: 12.8
- **GPU**: NVIDIA GeForce RTX 5080 (Blackwell, sm_120a)
- **Model**: `LiquidAI/LFM2.5-1.2B-Thinking` (`Lfm2ForCausalLM`)

## Steps to Reproduce

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="LiquidAI/LFM2.5-1.2B-Thinking",
    max_seq_length=4096,
    load_in_4bit=False,
    fast_inference=True,
)
```

## Notes

- vLLM itself handles LFM2 fine — model loads as `Lfm2ForCausalLM`, CUDA graphs are captured, KV cache is allocated. The crash is only in Unsloth's `_get_vllm_state_dict` wrapper.
- `fast_inference=False` works as expected (bypasses vLLM entirely).
- There is no `FastLfm2Model` class in Unsloth — LFM2 falls through to the generic `FastModel`/`FastBaseModel` path, which does attempt vLLM initialization.

## Suggested Fix

Add handling for Mamba/SSM layers in the loop — either skip them with `continue` or add an `elif hasattr(layer, "mixer")` branch that extracts the correct state dict for Mamba layers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] fast inference for LFM (and Mamba models) #4073

Bug Description

Error

Root Cause

Environment

Steps to Reproduce

Notes

Suggested Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature Request] fast inference for LFM (and Mamba models) #4073

Description

Bug Description

Error

Root Cause

Environment

Steps to Reproduce

Notes

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions