Phi-4 support

Phi-4 is now available on HF:
 - https://huggingface.co/microsoft/phi-4-gguf
 - https://huggingface.co/microsoft/phi-4

When trying to run the Phi-4 GGUF using the existing quantized_phi3 implementation I get:

```
loaded 243 tensors (9.05GB) in 0.27s
model built
Error: shape mismatch in reshape, lhs: [1, 12, 1280], rhs: [1, 12, 40, 128]
Write a function to count prime numbers up to N. %     
```

When trying to run the Phi-4 GGUF using the quantized_llama implementation I get:

```
loaded 243 tensors (9.05GB) in 0.30s
Error: cannot find llama.attention.head_count in metadata
```

These are reproducible via the quantized-phi examples by just swapping the models with:

```
"microsoft/phi-4-gguf",
"phi-4-q4.gguf",
"main",
```

It would be great to have Phi-4 support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phi-4 support #2712

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phi-4 support #2712

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions