[Speed] Prefill speed on MLC significantly slower than llama.cpp on Jetson Thor – any optimization suggestions?

# 🏎️  Speed Report

Hi everyone,

I’ve been testing the community versions of MLC and llama.cpp on Jetson Thor, and noticed a significant performance gap during prefill. I’d like to check if this is expected behavior or if there are optimization options I might have missed.

Model setup: Qwen3-30B-A3B (tested with Q4_K_M and q4bf16_1/q4f16_1 quantized variants)

Hardware: Jetson Thor

Performance comparison:

llama.cpp: Prefill for ~10k tokens takes about 30–40 seconds

MLC: Same setup requires 1.5~2× longer for prefill

On the other hand, MLC seems much faster during decode (roughly 3× faster than llama.cpp)

Stability:

llama.cpp server sometimes reports illegal memory access errors

MLC is more stable, but the prefill speed gap is quite large

Additional note: As far as I can tell, MLC currently does not support FP8 activation yet.

My questions are:

Is this prefill slowdown mainly due to MLC’s framework design, or lack of optimization/adaptation for Jetson Thor?

Are there recommended build flags, runtime parameters, or configuration tweaks to improve prefill performance?

If there are known issues or a roadmap for improvements, I’d really appreciate any pointers.

Thanks a lot!

- The model code: Qwen 30b A3b(MOE)


- The model configuration (e.g. quantization mode, running data type, etc.): q4f16_1, q4bf16_1 (llama.cpp using Q4_k_m)
- Device (e.g. MacBook Pro M2, PC+RTX 3080): Jetson Thor and Orin(64G)
- OS (if applicable):
- Encode speed (Token/s): for 7,000-token context on Orin,45–52 s first token(llama.cpp (q4_k_m): 23–26 s first token)
- Decode speed (Token/s): 2~3 times faster than llama.cpp
- Memory usage (if applicable):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Speed] Prefill speed on MLC significantly slower than llama.cpp on Jetson Thor – any optimization suggestions? #3328

🏎️ Speed Report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Speed] Prefill speed on MLC significantly slower than llama.cpp on Jetson Thor – any optimization suggestions? #3328

Description

🏎️ Speed Report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions