Axion uses a config.toml file for configuration, with environment variable overrides support.
-
Copy the example configuration:
cp config.example.toml config.toml
-
Edit
config.tomlto customize:nano config.toml # or your preferred editor -
Run Axion:
cargo run --release
[server]
host = "0.0.0.0" # Listen on all interfaces
port = 3000 # Server port[model]
# The model to serve
name = "meta-llama/Llama-3.2-3B-Instruct"
# Timeout for MAX to start (in seconds)
max_startup_timeout = 120Supported Models:
- Any HuggingFace model compatible with MAX
- Llama, Mistral, Qwen, Gemma, GLM4, Granite, Olmo families
- Custom/fine-tuned models
[cache]
enabled = true # Enable request caching
max_entries = 1000 # Maximum cached responsesHow caching works:
- Non-streaming requests are cached based on model + messages + temperature
- Instant responses for repeated queries
- LRU eviction when cache is full
[batching]
enabled = true # Enable continuous batching
max_batch_size = 8 # Process up to 8 requests together
max_wait_ms = 50 # Wait up to 50ms before processingBenefits:
- Improved throughput for concurrent requests
- Better GPU utilization
- Lower per-request latency
[streaming]
default = true # Stream responses by defaultBehavior:
true: Requests without explicit"stream"field will streamfalse: Requests need"stream": trueto enable streaming
[logging]
level = "info" # Global log level
modules = "axion=info,tower_http=info" # Per-module levelsLog Levels:
trace: Very verbose debuggingdebug: Detailed debugging informationinfo: General information (recommended)warn: Warning messages onlyerror: Error messages only
Environment variables take precedence over config.toml:
| Environment Variable | Overrides | Example |
|---|---|---|
MODEL_NAME |
model.name |
MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2" |
PORT |
server.port |
PORT=8080 |
RUST_LOG |
logging.modules |
RUST_LOG=debug |
Example:
# Override model from command line
MODEL_NAME="unsloth/gemma-3-270m-it" cargo run --release
# Override port
PORT=8080 cargo run --release
# Override logging
RUST_LOG=debug cargo run --release- Environment variables (highest priority)
- config.toml file
- Built-in defaults (lowest priority)
[server]
port = 3000
[model]
name = "meta-llama/Llama-3.2-3B-Instruct"
[logging]
level = "debug"
modules = "axion=debug,tower_http=debug"[server]
host = "0.0.0.0"
port = 80
[model]
name = "mistralai/Mistral-7B-Instruct-v0.2"
max_startup_timeout = 180
[cache]
enabled = true
max_entries = 5000
[batching]
enabled = true
max_batch_size = 16
max_wait_ms = 30
[logging]
level = "info"
modules = "axion=info,tower_http=warn"[model]
name = "unsloth/gemma-3-270m-it"
[cache]
enabled = false
[batching]
enabled = false
[streaming]
default = false- Axion will use built-in defaults
- You'll see: "Configuration loaded from: defaults"
- Check for missing quotes, brackets, or commas
- Use a TOML validator online
- Increase
max_startup_timeoutfor large models - Check that the model name is correct
- Verify you have access to the model on HuggingFace
- Start with defaults: Copy
config.example.tomltoconfig.toml - Use environment variables for testing: Override settings without editing the file
- Enable debug logging: Set
RUST_LOG=debugto troubleshoot issues - Adjust timeouts for large models: Increase
max_startup_timeoutif needed - Tune batching for your workload: Higher batch sizes for throughput, lower for latency
GETTING_STARTED.md- General usage guideREADME.md- Project overviewconfig.example.toml- Example configuration with comments