Skip to content

Latest commit

 

History

History
210 lines (157 loc) · 4.39 KB

File metadata and controls

210 lines (157 loc) · 4.39 KB

Axion Configuration Guide

Axion uses a config.toml file for configuration, with environment variable overrides support.

Quick Start

  1. Copy the example configuration:

    cp config.example.toml config.toml
  2. Edit config.toml to customize:

    nano config.toml  # or your preferred editor
  3. Run Axion:

    cargo run --release

Configuration Options

Server Configuration

[server]
host = "0.0.0.0"  # Listen on all interfaces
port = 3000       # Server port

Model Configuration

[model]
# The model to serve
name = "meta-llama/Llama-3.2-3B-Instruct"

# Timeout for MAX to start (in seconds)
max_startup_timeout = 120

Supported Models:

  • Any HuggingFace model compatible with MAX
  • Llama, Mistral, Qwen, Gemma, GLM4, Granite, Olmo families
  • Custom/fine-tuned models

Cache Configuration

[cache]
enabled = true      # Enable request caching
max_entries = 1000  # Maximum cached responses

How caching works:

  • Non-streaming requests are cached based on model + messages + temperature
  • Instant responses for repeated queries
  • LRU eviction when cache is full

Batching Configuration

[batching]
enabled = true        # Enable continuous batching
max_batch_size = 8    # Process up to 8 requests together
max_wait_ms = 50      # Wait up to 50ms before processing

Benefits:

  • Improved throughput for concurrent requests
  • Better GPU utilization
  • Lower per-request latency

Streaming Configuration

[streaming]
default = true  # Stream responses by default

Behavior:

  • true: Requests without explicit "stream" field will stream
  • false: Requests need "stream": true to enable streaming

Logging Configuration

[logging]
level = "info"  # Global log level
modules = "axion=info,tower_http=info"  # Per-module levels

Log Levels:

  • trace: Very verbose debugging
  • debug: Detailed debugging information
  • info: General information (recommended)
  • warn: Warning messages only
  • error: Error messages only

Environment Variable Overrides

Environment variables take precedence over config.toml:

Environment Variable Overrides Example
MODEL_NAME model.name MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2"
PORT server.port PORT=8080
RUST_LOG logging.modules RUST_LOG=debug

Example:

# Override model from command line
MODEL_NAME="unsloth/gemma-3-270m-it" cargo run --release

# Override port
PORT=8080 cargo run --release

# Override logging
RUST_LOG=debug cargo run --release

Configuration Priority

  1. Environment variables (highest priority)
  2. config.toml file
  3. Built-in defaults (lowest priority)

Examples

Development Setup

[server]
port = 3000

[model]
name = "meta-llama/Llama-3.2-3B-Instruct"

[logging]
level = "debug"
modules = "axion=debug,tower_http=debug"

Production Setup

[server]
host = "0.0.0.0"
port = 80

[model]
name = "mistralai/Mistral-7B-Instruct-v0.2"
max_startup_timeout = 180

[cache]
enabled = true
max_entries = 5000

[batching]
enabled = true
max_batch_size = 16
max_wait_ms = 30

[logging]
level = "info"
modules = "axion=info,tower_http=warn"

Lightweight Setup

[model]
name = "unsloth/gemma-3-270m-it"

[cache]
enabled = false

[batching]
enabled = false

[streaming]
default = false

Troubleshooting

Config file not found

  • Axion will use built-in defaults
  • You'll see: "Configuration loaded from: defaults"

Invalid config syntax

  • Check for missing quotes, brackets, or commas
  • Use a TOML validator online

Model won't load

  • Increase max_startup_timeout for large models
  • Check that the model name is correct
  • Verify you have access to the model on HuggingFace

Tips

  1. Start with defaults: Copy config.example.toml to config.toml
  2. Use environment variables for testing: Override settings without editing the file
  3. Enable debug logging: Set RUST_LOG=debug to troubleshoot issues
  4. Adjust timeouts for large models: Increase max_startup_timeout if needed
  5. Tune batching for your workload: Higher batch sizes for throughput, lower for latency

See Also

  • GETTING_STARTED.md - General usage guide
  • README.md - Project overview
  • config.example.toml - Example configuration with comments