BF16 backend cannot parse gpt-oss packed/fused expert weight format

### Reminder

- [x] I have read the above rules and searched the existing issues.

### System Info

- **kt-kernel**: `main` @ 16a8b98
- **SGLang**: `main` @ 45095bac7
- **Model**: openai/gpt-oss-120b (HuggingFace safetensors, MXFP4)
- **GPU**: RTX 5090 (32GB)
- **CPU**: AMD Ryzen 9 9900X (Zen 5, AVX-512 BF16)

### Reproduction


## Repository: kvcache-ai/ktransformers (could tie to #1655) 

## Summary

The BF16SafeTensorLoader defaults to DeepSeek MoE weight format and cannot parse gpt-oss's packed fused expert tensors. This is a separate failure mode from the GGUF/MXFP4 type 39 issue already documented in #1655.

## Details

### Weight format mismatch

The BF16SafeTensorLoader expects DeepSeek-style per-expert weight keys:
```
model.layers.0.mlp.experts.0.gate_proj.weight
model.layers.0.mlp.experts.0.up_proj.weight
model.layers.0.mlp.experts.0.down_proj.weight
```

gpt-oss uses packed fused tensors with MXFP4 block/scale format:
```
model.layers.0.mlp.experts.gate_up_proj_blocks
model.layers.0.mlp.experts.gate_up_proj_scales
model.layers.0.mlp.experts.gate_up_proj_bias
model.layers.0.mlp.experts.down_proj_blocks
model.layers.0.mlp.experts.down_proj_scales
model.layers.0.mlp.experts.down_proj_bias
```

All 128 experts are packed into single tensors per layer (fused gate+up projection), with separate block quantization scales. There is no per-expert dimension in the key naming.

### Observed behavior

```
[BF16SafeTensorLoader] No MoE format detected, defaulting to: deepseek
```

The loader (`kt-kernel/python/utils/loader.py` line 463) fails to match gpt-oss's key pattern and falls back to DeepSeek format, which then fails to load the weights correctly.

## Three confirmed failure paths for gpt-oss on KTransformers

For completeness, here are all three integration paths attempted and their failure modes:

| Backend | Weight Source | Failure | Root Cause |
|---------|-------------|---------|------------|
| BF16 | HF safetensors | Format mismatch | Packed fused expert tensors, not per-expert keys |
| LLAMAFILE | MXFP4 GGUF | `ValueError: 39 is not a valid GGMLQuantizationType` | gguf 0.17.1 lacks MXFP4 type 39 |
| LLAMAFILE | Q4_K_M GGUF | **Testing in progress** | Standard quant type, should be compatible |

## Suggestion

To support gpt-oss (and likely future models using packed/fused MoE formats):

1. Add a gpt-oss format detector in the BF16SafeTensorLoader alongside the existing DeepSeek detector
2. Implement unpacking logic for fused `gate_up_proj_blocks`/`scales` into individual expert weights
3. Alternatively, document that gpt-oss requires the LLAMAFILE backend with standard (non-MXFP4) GGUF quantizations

## Note on successful llama.cpp baseline

For reference, gpt-oss-120b runs successfully on the same hardware via llama.cpp with `--override-tensor` expert offloading at **37.95 t/s** (15 of 36 layers' experts on GPU, remainder on CPU via DDR5). The model architecture is functional — it's specifically the KTransformers integration paths that need gpt-oss format support.


### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BF16 backend cannot parse gpt-oss packed/fused expert weight format #1861

Reminder

System Info

Reproduction

Repository: kvcache-ai/ktransformers (could tie to #1655)

Summary

Details

Weight format mismatch

Observed behavior

Three confirmed failure paths for gpt-oss on KTransformers

Suggestion

Note on successful llama.cpp baseline

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Backend	Weight Source	Failure	Root Cause
BF16	HF safetensors	Format mismatch	Packed fused expert tensors, not per-expert keys
LLAMAFILE	MXFP4 GGUF	`ValueError: 39 is not a valid GGMLQuantizationType`	gguf 0.17.1 lacks MXFP4 type 39
LLAMAFILE	Q4_K_M GGUF	Testing in progress	Standard quant type, should be compatible

BF16 backend cannot parse gpt-oss packed/fused expert weight format #1861

Description

Reminder

System Info

Reproduction

Repository: kvcache-ai/ktransformers (could tie to #1655)

Summary

Details

Weight format mismatch

Observed behavior

Three confirmed failure paths for gpt-oss on KTransformers

Suggestion

Note on successful llama.cpp baseline

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions