Nanochat-rs: Tiny GPT-style cognitive core in pure Rust

This is a rust implementation of karpathy/nanochat. Built on candle, focused on clarity and parity with the reference while keeping the code minimal.

Warning

Experimental; focused on inference for now.

Features

Native Rust
Hugging Face integration
Centralized model loader resilient to tensor name changes
Minimal surface area to keep cognitive load low (not production-grade; performance–simplicity trade-off)
Compatible with tiktoken .pkl tokenizer configs

Differences from the referenced nanochat

Tokenizer encoding/decoding is production-ready and unified (no tiktoken dependency at runtime)
Removed the embedded Python interpreter in the engine
Performance and ergonomics improvements in the generation logic
More emphasis on post-training

Quick start

We have a 32‑layer version trained with about $1000 budget on Hugging Face: https://huggingface.co/Antigma/nanochat-d32
There is also a smaller 20‑layer version (d20) used for benchmarks and testing within the same Hugging Face repo.

run on Apple With GPU

cargo run --release --features metal -- -p "write 100 words"

with CUDA

cargo run --release --features cuda -- -p "write 100 words"

We also provide a ChatGPT-like web UI. To launch it, run:

cargo run --bin chat_web

After starting the server, visit the URL displayed in the terminal (by default, http://localhost:8000/, or use your host's public IP/port if running on a remote machine).

You can now interact with your local LLM just like ChatGPT—try asking creative questions, writing stories, or exploring model behavior.

Note

The web UI assets (ui.html, logo.svg, etc.) live inside the reference/nanochat submodule. Make sure you initialize submodules before running chat_web:

git submodule update --init --recursive

The web server loads a model from Hugging Face by default (hf:Antigma/nanochat-d32). If you already have the weights locally, point the server to them with --source /path/to/model_dir to avoid unnecessary downloads.

Server flags

cargo run --release --bin chat_web -- \
  [--num-workers N] \
  [--source hf:<repo_id>|/path/to/model_dir] \
  [--temperature FLOAT] [--top-k INT] [--max-tokens INT] [--seed INT] \
  [--host HOST] [--port PORT] \
  [--ui-path /path/to/ui.html] [--logo-path /path/to/logo.svg]

--num-workers: number of model replicas to load (e.g., one per GPU)
--source: hf:<repo_id> or a local directory with model files
--temperature, --top-k, --max-tokens, --seed: sampling defaults
--host, --port: bind address (defaults to 0.0.0.0:8000)
--ui-path, --logo-path: override embedded/fallback UI assets

REST API

Health:

GET /health

Stats:

GET /stats

Chat completions (SSE stream via POST):

POST /chat/completions
Content-Type: application/json
{
  "messages": [
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.8,
  "top_k": 50,
  "max_tokens": 256
}

Example:

curl -N -H "Content-Type: application/json" \
  -X POST \
  --data '{"messages":[{"role":"user","content":"Write a haiku about Rust."}]}' \
  http://localhost:8000/chat/completions

Server-side constraints:

Roles: user, assistant, system
Max messages/request: 500
Max chars/message: 8,000
Max total conversation chars: 32,000
temperature ∈ [0.0, 2.0], top_k ∈ [1, 200], max_tokens ∈ [1, 4096]

Demo

Direct link: demo1.mp4

Build:

cargo build --release

Run tests (validates parity against reference fixtures):

cargo test -q

Run benchmarks (Criterion):

cargo bench --features metal

This benchmark uses the d20 version of the model Reports will be written under target/criterion/**/report/index.html.

Upstream tracking and correctness

The reference implementation lives under reference/nanochat (tracked from karpathy/nanochat). Parity is tested via auto-generated fixtures under fixtures/. Numerical tolerance for logits parity is 1e-4.

If you cloned with submodules, you can update them with:

git submodule update --init --remote --recursive

Regenerate fixtures

This uses the Python reference via uv and writes JSON fixtures into fixtures/.

./gen-fixtures.sh

Tokenizer tests expect a tiktoken pickle at reference/tokenizer.pkl (a small one is provided in the repo).

Troubleshooting

Slow or failing downloads: pre-download from HF and pass --source /path/to/model_dir
UI not shown: ensure reference/nanochat/ui.html exists or pass --ui-path
CUDA errors: verify driver/runtime and rebuild with --features cuda
Apple GPU: prefer --features metal (default on macOS targets)

High level roadmap

SFT and RL(requires backward pass)
(maybe) bring back the embedded python interpreter (or other context-free language like Lua)
Additional tensor backend other than Candle, encountered many candle kernel issues while building this.
Pretraining (low priority, likely limited utility to do full training in Rust)

License

MIT or Apache-2.0, at your option.

Acknowledgements

Andrej Karpathy for the original nanochat
Hugging Face Candle team for the lightweight Rust tensor/NN stack

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
benches		benches
fixtures		fixtures
reference		reference
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.toml		Cargo.toml
README.md		README.md
demo1.mp4		demo1.mp4
gen-fixtures.sh		gen-fixtures.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nanochat-rs: Tiny GPT-style cognitive core in pure Rust

Features

Differences from the referenced nanochat

Quick start

Server flags

REST API

Demo

Upstream tracking and correctness

Regenerate fixtures

Troubleshooting

High level roadmap

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nanochat-rs: Tiny GPT-style cognitive core in pure Rust

Features

Differences from the referenced nanochat

Quick start

Server flags

REST API

Demo

Upstream tracking and correctness

Regenerate fixtures

Troubleshooting

High level roadmap

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages