Skip to content

Phi-4 support #2712

@filipw

Description

@filipw

Phi-4 is now available on HF:

When trying to run the Phi-4 GGUF using the existing quantized_phi3 implementation I get:

loaded 243 tensors (9.05GB) in 0.27s
model built
Error: shape mismatch in reshape, lhs: [1, 12, 1280], rhs: [1, 12, 40, 128]
Write a function to count prime numbers up to N. %     

When trying to run the Phi-4 GGUF using the quantized_llama implementation I get:

loaded 243 tensors (9.05GB) in 0.30s
Error: cannot find llama.attention.head_count in metadata

These are reproducible via the quantized-phi examples by just swapping the models with:

"microsoft/phi-4-gguf",
"phi-4-q4.gguf",
"main",

It would be great to have Phi-4 support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions