Skip to content

Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.

License

Notifications You must be signed in to change notification settings

adriancable/qwen3.c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qwen3.c

Cute Llama

Run inference for frontier models based on the Qwen3 architecture, like Qwen3-4B or DeepSeek-R1-0528-Qwen3-8B, on your local Linux/macOS/Windows machine. No complicated configuration required, just follow the steps below and enjoy.

Understand the basics of transformers but want to learn in-depth how LLM inference works? qwen3.c runs LLMs using one easy-to-understand (relatively speaking!) file of C source with no dependencies. Once you've digested it and understand the data flow, you're there.

This project's starting point was Andrej Karpathy's llama2.c, which does single-file inference for LLaMA 2-compatible models. The LLaMA 2 architecture is now 2 years old (a lifetime in the field of AI) and is long superseded. This project aims to maintain the simplicity of llama2.c while supporting a frontier model architecture, with the goal of being both an up-to-date learning resource and also a great way to run the latest models locally.

Despite being only around 1000 lines of C code with no dependencies, qwen3.c supports everything you need to enjoy running leading Qwen3-architecture LLMs on standard hardware (no GPUs needed), including multi-CPU core operation, support for Unicode/multi-language input and output, and thinking/reasoning models.

qwen3.c includes a Python tool to process any Qwen3-architecture HuggingFace model, converting to qwen3.c's model format which uses Q8_0 quantization for a good trade-off between quality and performance.

Step 1: checkout and build

First, checkout this repo and build it. I recommend the OpenMP version if your toolchain supports it, as it supports multiple CPU cores for dramatically improved performance:

git clone https://github.com/adriancable/qwen3.c
cd qwen3.c
make openmp

(To build without OpenMP, just run make without the openmp argument.)

Step 2: download and convert a model

Install any needed Python dependencies for the HuggingFace export utility:

pip install -r requirements.txt

Then, pick any dense (no Mixture-of-Experts) unquantized (not GGUF) Qwen3-architecture model from HuggingFace. Unless you have lots of RAM, start with smaller models. Qwen/Qwen3-4B is great, so we'll start with that.

Run the Python 3 export tool (will take around 10 minutes) to download the model from HuggingFace and convert to qwen3.c's quantized checkpoint format, storing in a file called Qwen3-4B.bin:

python export.py Qwen3-4B.bin Qwen/Qwen3-4B

Step 3: run and enjoy

./runq Qwen3-4B.bin

Fun things you can try asking:

Tell me a surprising fact about an animal of your choice.

Write a short story for a 5 year old girl, featuring Sobieski the dog and Pepe the cat.

Write a C program which sorts a list using the bubble sort algorithm.

Write a poem about a little boy who builds a rocket to fly to the moon. In Japanese, please.

Translate into English: 我希望您喜欢使用 qwen3.c 学习 LLM。

Step 4: experiment with reasoning mode

qwen3.c also supports reasoning/thinking, if the model used supports it. Enable thinking with the -r 1 command line parameter:

./runq Qwen3-4B.bin -r 1

Then try:

Solve the quadratic equation x^2 - 5x + 6 = 0.

What is 19673261 * 1842.64?

Step 5: explore other models

Try for example DeepSeek-R1-0528-Qwen3-8B:

python export.py DeepSeek-R1-0528-Qwen3-8B.bin deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

Then:

./runq DeepSeek-R1-0528-Qwen3-8B.bin

Advanced options

qwen3.c lets you configure model settings via the command line including setting a system prompt, setting temperature, sampling parameters and so forth. To show available settings, run qwen3.c without any command-line parameters:

./runq

License

MIT

About

Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published