🦀 RustGPT: Advanced LLM Implementation in Pure Rust

A complete Large Language Model implementation in pure Rust with advanced architectures including Transformers, TRM (Transformer-Recurrent Mixtures), Diffusion models, Mamba, and RG-LRU. Built from scratch using only ndarray for matrix operations.

🚀 What This Is

RustGPT is an educational and experimental platform demonstrating modern LLM architectures:

Multiple Architecture Support: Transformers, TRM, Diffusion models, Mamba, RG-LRU
Advanced Features: Speculative sampling, Mixture of Experts, Adaptive residuals
Comprehensive Training: Pre-training + instruction tuning pipelines
Robust Error Handling: Proper Result types, no panic!() calls
Production-grade Serialization: Versioned model persistence with integrity checks
Extensive Testing: 183+ unit tests with property-based testing

🏗️ Current Architecture

The project now supports multiple advanced architectures:

1. Transformer Architecture

Input → Tokenization → Embeddings → Transformer Blocks → Output Projection → Predictions

2. TRM (Transformer-Recurrent Mixture)

Hybrid architecture combining transformer attention with recurrent components for improved efficiency.

3. Diffusion Models

Denoising diffusion probabilistic models for text generation with progressive refinement.

4. Mamba

State-space models with selective scan mechanisms for linear-time sequence processing.

5. RG-LRU (Real-Gated Linear Recurrent Units)

Trainable temporal-mixing layers with diagonal, stable recurrence for efficient sequence processing.

6. MoH-RG-LRU (Multi-head RG-LRU with Mixture-of-Heads)

Combines multiple RG-LRU heads with learned gating for improved capacity and efficiency.

Key Components

Polynomial Attention: Multi-head attention with polynomial logit transformations
Richards GLU: Advanced gating mechanisms with Richards curve activation
Adaptive Residuals: Dynamic residual scaling for stable training
Mixture of Experts: Sparse expert routing for improved capacity
Speculative Sampling: Accelerated decoding with draft-verify mechanisms
Modular Transformer Components: AttentionContext, FeedforwardProcessor, NormalizationLayer, and ResidualConnection for flexible architecture composition
Temporal Mixing: Supports both attention and RG-LRU as temporal mixing mechanisms

🔍 Project Structure

src/
├── main.rs                  # 🎯 Training pipeline and CLI
├── llm.rs                   # 🧠 Core LLM implementation
├── lib.rs                   # 📚 Library exports and constants
├── attention/               # 👀 Advanced attention mechanisms
├── layers/                  # 🏗️ Layer implementations
│   ├── transformer/         # Transformer blocks
│   ├── recurrence/          # Recurrent components
│   ├── ssm/                 # State-space models (Mamba, RG-LRU)
│   ├── diffusion/           # Diffusion model components
│   └── components/          # Shared components
├── mixtures/                # 🧪 Mixture of Experts
├── decoding/                # 🎰 Decoding strategies
├── encoding/                # 📝 Tokenization and vocabulary
├── richards/                # 📈 Richards curve utilities
├── eprop/                   # 🔄 Training and optimization
└── ... (20+ modules)

tests/
├── attention_parallel.rs   # Attention mechanism tests
├── model_persistence_roundtrip.rs # Serialization tests
├── transformer_block_stability.rs # Stability tests
└── ... (183+ unit tests)

🧪 Training Pipeline

The model supports a sophisticated training process:

1. Pre-training Phase

Learns basic language patterns and world knowledge
Uses factual statements and general text data
Configurable epochs and learning rates

2. Instruction Tuning Phase

Fine-tunes for conversational AI capabilities
Uses question-answer pairs and dialogue data
Lower learning rate for refinement

3. Advanced Features

Speculative Sampling: --speculative flag enables draft-verify decoding
Diffusion Training: --diffusion flag enables diffusion-based training
Mixture of Experts: Configurable expert routing strategies
Adaptive Windowing: Dynamic attention window adaptation

🚀 Quick Start

# Clone and run
git clone https://github.com/tekaratzas/RustGPT.git
cd RustGPT
cargo run --release

# Basic training (default transformer)
cargo run --release

# With speculative sampling (transformer mode)
cargo run --release -- --speculative --speculative-mode transformer

# With speculative sampling (diffusion mode)
cargo run --release -- --speculative --speculative-mode diffusion

# With Mamba architecture
cargo run --release -- --architecture mamba

# With RG-LRU architecture
cargo run --release -- --architecture rg-lru

# With deterministic training (fixed seed)
cargo run --release -- --seed 42

# Continue training from saved model
cargo run --release -- --continue-from models/rustgpt.bin

🎮 Interactive Mode

After training, test the model interactively:

# Run with interactive flag
cargo run --release -- --interactive

# Example conversation
Enter prompt: How do mountains form?
Model: Mountains form through tectonic forces or volcanism over geological time

Enter prompt: What causes rain?
Model: Rain occurs when water vapor condenses into droplets that become too heavy to remain airborne

# Interactive mode with specific architecture
cargo run --release -- --architecture mamba --interactive

💾 Model Persistence

Versioned Serialization with Integrity Checks

use llm::LLM;

// Save with versioning, checksums, and metadata
let llm = LLM::default();
llm.save_versioned("model.rgpt", Some("Trained RustGPT model".to_string()))?;

// Load with automatic validation
let loaded_llm = LLM::load_versioned("model.rgpt")?;
// ✅ Validates SHA256 checksum
// ✅ Checks version compatibility  
// ✅ Includes comprehensive metadata

// Save different architectures
let mamba_llm = LLM::new_mamba(vocab.clone(), config);
mamba_llm.save_versioned("mamba_model.rgpt", Some("Mamba architecture".to_string()))?;

let rg_lru_llm = LLM::new_rg_lru(vocab.clone(), config);
rg_lru_llm.save_versioned("rg_lru_model.rgpt", Some("RG-LRU architecture".to_string()))?;

Format Options

Binary (.bin, .rgpt): Compact, fast I/O, production-ready
JSON (.json): Human-readable, debuggable
MessagePack: Efficient binary format with schema support

🧮 Technical Implementation

Current Configuration

Vocabulary Size: Dynamic (up to 50,000 tokens)
Embedding Dimension: 128 (configurable)
Hidden Dimension: 256 (configurable)
Max Sequence Length: 256 tokens
Architecture Options: Transformer, TRM, Diffusion, Mamba, RG-LRU, MoH-RG-LRU
Normalization: Richards-based Dynamic Tanh Normalization
Positional Encoding: CoPE (Context-aware Positional Encoding)
Activation: Richards GLU and SwiGLU
Temporal Mixing: Attention or RG-LRU (configurable per transformer block)
Speculative Sampling: Transformer and Diffusion modes with configurable gamma and tau

Training Details

Optimizer: Adam with gradient clipping
Learning Rates: Configurable per phase
Loss Function: Cross-entropy with label smoothing
Regularization: L2 regularization, gradient norm monitoring
Batch Processing: Gradient accumulation for large batches

Advanced Features

Speculative Sampling

Draft Model: Fast approximation model
Verification Model: Full model for validation
Gamma Parameter: Controls speculation aggressiveness
Tau Parameter: Controls acceptance threshold
Transformer Support: New speculative sampling implementation for transformer models
Diffusion Support: Existing speculative sampling for diffusion models

Mamba Architecture

Selective SSM: State-space models with input-dependent parameters
Causal Convolution: Depthwise convolution for sequence processing
Selective Scan: Efficient sequence processing with selective state updates

RG-LRU Architecture

Real-Gated Recurrence: Trainable temporal mixing with gated updates
Diagonal Recurrence: Stable recurrence with diagonal parameterization
Multi-head Support: MoH-RG-LRU combines multiple heads with learned gating

Diffusion Models

Karras Schedule: Noise scheduling for diffusion
SNR Weighting: Signal-to-noise ratio based training
Latent Diffusion: Efficient latent space processing

Mixture of Experts

Expert Routing: Top-k gating with load balancing
Adaptive Depth: Dynamic layer selection
Threshold Prediction: Learned routing thresholds

🔧 Development & Testing

Running Tests

# Run all tests (183+ unit tests)
cargo test --lib

# Run integration tests
cargo test --test transformer_block_stability
cargo test --test model_persistence_roundtrip

# Run attention tests
cargo test --test attention_parallel

# Run with clippy for code quality
cargo clippy --tests -- -D warnings

# Build optimized version
cargo build --release

# Run with verbose output
cargo test -- --nocapture

# Test specific architectures
cargo test --lib -- --test-threads=1  # For deterministic test ordering

Test Coverage

183+ Unit Tests: Core functionality validation
Property-Based Tests: Mathematical invariants using proptest
Edge Case Testing: Boundary conditions and error handling
Stability Tests: Gradient boundedness and numerical stability
Integration Tests: End-to-end workflow validation

Observability

Structured logging via tracing crate:

# Set log level
RUST_LOG=debug cargo run
RUST_LOG=info cargo run   # Default
RUST_LOG=warn cargo run   # Warnings only
RUST_LOG=error cargo run   # Errors only

Example training output:

INFO  llm::training: Starting pre-training phase
INFO  llm::training: Epoch 1/100 - loss: 2.3456, grad_norm: 0.1234
INFO  llm::training: Epoch 2/100 - loss: 2.1234, grad_norm: 0.0987
INFO  llm::training: Transitioning to instruction tuning phase

📊 Dependencies

Minimal dependency footprint:

ndarray - N-dimensional arrays for matrix operations
rand + rand_distr - Random number generation
serde + serde_json - Serialization
tracing - Structured logging
rayon - Parallel processing
sha2 - Cryptographic hashing for integrity checks

No PyTorch, TensorFlow, or Candle - pure Rust implementation!

🤝 Contributing

RustGPT welcomes contributions for learning and experimentation!

Current Architecture Options

Transformer: Standard transformer blocks
TRM: Transformer-Recurrent Mixture
Diffusion: Denoising diffusion models
Mamba: State-space models with selective scan
RG-LRU: Real-Gated Linear Recurrent Units

Areas for Contribution

🚀 Beginner: Documentation, examples, test cases
🔥 Intermediate: New layer types, decoding strategies
⚡ Advanced: Architecture improvements, training optimizations

Getting Started

# Fork the repository
# Create a feature branch
git checkout -b feature/new-architecture

# Make changes and add tests
# Run the test suite
cargo test

# Submit a pull request

Code Quality Standards

Follow Rust conventions (cargo fmt)
Comprehensive test coverage for new features
Proper error handling (no panic!() calls)
Documentation updates for new functionality

📈 Project Status

Current Capabilities

✅ Multiple Architectures: Transformer, TRM, Diffusion, Mamba, RG-LRU, MoH-RG-LRU
✅ Advanced Training: Speculative sampling (Transformer & Diffusion), MoE, adaptive residuals
✅ Robust Serialization: Versioned persistence with integrity checks
✅ Comprehensive Testing: 183+ unit tests, property-based testing
✅ Production Error Handling: Proper Result types throughout
✅ Configurable Pipeline: CLI-driven training with multiple options
✅ Modular Components: AttentionContext, FeedforwardProcessor, NormalizationLayer, ResidualConnection
✅ Temporal Mixing: Configurable attention or RG-LRU per transformer block

Recent Improvements

Latest: Added modular transformer components for flexible architecture composition
Latest: Implemented speculative sampling for transformer models
Latest: Added Mamba and RG-LRU state-space model implementations
Sprint 5.2: Systematic error handling (eliminated all panic!() calls)
Sprint 5.1: Code quality improvements (removed placeholder comments)
Sprint 4.3: Serialization integrity (SHA256 checksums, versioning)
Sprint 4.2: Training reliability (divergence detection, observability)

Roadmap

Next Sprint: Convert remaining unwrap() calls in hot paths
Future: Beam search, advanced positional encodings, mixed-precision training
Long-term: Multi-modal capabilities, larger scale training, architecture auto-selection

📚 Learning Resources

RustGPT demonstrates modern LLM concepts:

Architecture Design: Multiple neural network architectures
Training Techniques: Speculative sampling, diffusion models
Optimization: Mixture of Experts, adaptive residuals
Error Handling: Production-grade Rust error management
Testing: Comprehensive test strategies for ML systems

Perfect for understanding how state-of-the-art LLMs work under the hood!

No external ML frameworks - just pure Rust, linear algebra, and careful engineering!

Name		Name	Last commit message	Last commit date
Latest commit History 321 Commits
.cargo		.cargo
.config		.config
.gemini		.gemini
.github		.github
.trae/documents		.trae/documents
benches		benches
data		data
docs		docs
examples		examples
models/conversations		models/conversations
proptest-regressions/richards		proptest-regressions/richards
scripts		scripts
src		src
target-release-alias-check		target-release-alias-check
tests		tests
training_logs		training_logs
.gitignore		.gitignore
AGENTS.md		AGENTS.md
AUDIT_FINAL_SUMMARY.md		AUDIT_FINAL_SUMMARY.md
AUDIT_REPORT.md		AUDIT_REPORT.md
BASELINE_PERFORMANCE_PROFILING.md		BASELINE_PERFORMANCE_PROFILING.md
BUG_FIX_STATUS.md		BUG_FIX_STATUS.md
CHECKLIST_PHASE5.1.2_COMPLETION.md		CHECKLIST_PHASE5.1.2_COMPLETION.md
CODE_CHANGES_SUMMARY.md		CODE_CHANGES_SUMMARY.md
COMMIT_MESSAGE_TEMPLATE_FEB14.txt		COMMIT_MESSAGE_TEMPLATE_FEB14.txt
COMMIT_NOTES_PHASE5.6_CLEANUP.md		COMMIT_NOTES_PHASE5.6_CLEANUP.md
CONSOLIDATION_AND_GPU_IMPLEMENTATION_ACTION_PLAN_FEB14.md		CONSOLIDATION_AND_GPU_IMPLEMENTATION_ACTION_PLAN_FEB14.md
CONSOLIDATION_AND_GPU_IMPLEMENTATION_PLAN.md		CONSOLIDATION_AND_GPU_IMPLEMENTATION_PLAN.md
CONSOLIDATION_ATTENTION_CONTEXT_STEP_OPTIMIZATION.md		CONSOLIDATION_ATTENTION_CONTEXT_STEP_OPTIMIZATION.md
CONSOLIDATION_COMPONENTS_MANIFEST.md		CONSOLIDATION_COMPONENTS_MANIFEST.md
CONSOLIDATION_CONTINUATION_PHASE3.md		CONSOLIDATION_CONTINUATION_PHASE3.md
CONSOLIDATION_DIFFUSION_TRANSFORMER_SSM.md		CONSOLIDATION_DIFFUSION_TRANSFORMER_SSM.md
CONSOLIDATION_DOCUMENTATION_INDEX_FEB14.md		CONSOLIDATION_DOCUMENTATION_INDEX_FEB14.md
CONSOLIDATION_FEB15_GPU_PHASE5_6.md		CONSOLIDATION_FEB15_GPU_PHASE5_6.md
CONSOLIDATION_FEEDFORWARD_OPTIMIZATION.md		CONSOLIDATION_FEEDFORWARD_OPTIMIZATION.md
CONSOLIDATION_GPU_BACKEND_PHASE5.3.md		CONSOLIDATION_GPU_BACKEND_PHASE5.3.md
CONSOLIDATION_GPU_BACKEND_SESSION_SUMMARY.md		CONSOLIDATION_GPU_BACKEND_SESSION_SUMMARY.md
CONSOLIDATION_GPU_BACKEND_VARIANTS_SESSION.md		CONSOLIDATION_GPU_BACKEND_VARIANTS_SESSION.md
CONSOLIDATION_GPU_IMPLEMENTATION_PHASE5.5.md		CONSOLIDATION_GPU_IMPLEMENTATION_PHASE5.5.md
CONSOLIDATION_GPU_KERNELS_PHASE5.6.3_EXECUTION.md		CONSOLIDATION_GPU_KERNELS_PHASE5.6.3_EXECUTION.md
CONSOLIDATION_GPU_PHASE5.6.2_STATUS.md		CONSOLIDATION_GPU_PHASE5.6.2_STATUS.md
CONSOLIDATION_GPU_PHASE5.6.3_IMMEDIATE_START.md		CONSOLIDATION_GPU_PHASE5.6.3_IMMEDIATE_START.md
CONSOLIDATION_GPU_PHASE5.6_INDEX.md		CONSOLIDATION_GPU_PHASE5.6_INDEX.md
CONSOLIDATION_GPU_PHASE5.6_SESSION_STATUS.md		CONSOLIDATION_GPU_PHASE5.6_SESSION_STATUS.md
CONSOLIDATION_GPU_PHASE5_6_ACTION_PLAN.md		CONSOLIDATION_GPU_PHASE5_6_ACTION_PLAN.md
CONSOLIDATION_GPU_STRICT_AUTO_DETECTION_PHASE5.9.md		CONSOLIDATION_GPU_STRICT_AUTO_DETECTION_PHASE5.9.md
CONSOLIDATION_MEMORY_MANAGEMENT_BEFORE_AFTER.md		CONSOLIDATION_MEMORY_MANAGEMENT_BEFORE_AFTER.md
CONSOLIDATION_OPTIMIZATION_ACTIONS.md		CONSOLIDATION_OPTIMIZATION_ACTIONS.md
CONSOLIDATION_OPTIMIZATION_PHASE5_CONTINUATION.md		CONSOLIDATION_OPTIMIZATION_PHASE5_CONTINUATION.md
CONSOLIDATION_OPTIMIZATION_PLAN.md		CONSOLIDATION_OPTIMIZATION_PLAN.md
CONSOLIDATION_PHASE3_CLEANUP_PLAN.md		CONSOLIDATION_PHASE3_CLEANUP_PLAN.md
CONSOLIDATION_PHASE3_OPTIMIZATION_SESSION.md		CONSOLIDATION_PHASE3_OPTIMIZATION_SESSION.md
CONSOLIDATION_PHASE3_PROGRESS.md		CONSOLIDATION_PHASE3_PROGRESS.md
CONSOLIDATION_PHASE3_SESSION_SUMMARY.md		CONSOLIDATION_PHASE3_SESSION_SUMMARY.md
CONSOLIDATION_PHASE4_CONTINUATION.md		CONSOLIDATION_PHASE4_CONTINUATION.md
CONSOLIDATION_PHASE4_OPTIMIZATION.md		CONSOLIDATION_PHASE4_OPTIMIZATION.md
CONSOLIDATION_PHASE4_SESSION_SUMMARY.md		CONSOLIDATION_PHASE4_SESSION_SUMMARY.md
CONSOLIDATION_PHASE5.1.2_ATTENTION_CONTEXT_COMPLETION.md		CONSOLIDATION_PHASE5.1.2_ATTENTION_CONTEXT_COMPLETION.md
CONSOLIDATION_PHASE5.6_GPU_BACKENDS_PLAN.md		CONSOLIDATION_PHASE5.6_GPU_BACKENDS_PLAN.md
CONSOLIDATION_PHASE5_COMPLETION_REPORT_FEB13_2026.md		CONSOLIDATION_PHASE5_COMPLETION_REPORT_FEB13_2026.md
CONSOLIDATION_PHASE5_CONTINUATION.md		CONSOLIDATION_PHASE5_CONTINUATION.md
CONSOLIDATION_PHASE5_CONTINUATION_PLAN.md		CONSOLIDATION_PHASE5_CONTINUATION_PLAN.md
CONSOLIDATION_PHASE5_OPTIMIZATION.md		CONSOLIDATION_PHASE5_OPTIMIZATION.md
CONSOLIDATION_PHASE5_STREAMING_WORKSPACE_UNIFICATION.md		CONSOLIDATION_PHASE5_STREAMING_WORKSPACE_UNIFICATION.md
CONSOLIDATION_PRIORITY_MATRIX_FEB14.md		CONSOLIDATION_PRIORITY_MATRIX_FEB14.md
CONSOLIDATION_PROGRESS_REPORT.md		CONSOLIDATION_PROGRESS_REPORT.md
CONSOLIDATION_SESSION_COMPLETE.txt		CONSOLIDATION_SESSION_COMPLETE.txt
CONSOLIDATION_SESSION_FEB14_IMMEDIATE_ACTIONS.md		CONSOLIDATION_SESSION_FEB14_IMMEDIATE_ACTIONS.md
CONSOLIDATION_SESSION_FEB14_WORKSPACE.md		CONSOLIDATION_SESSION_FEB14_WORKSPACE.md
CONSOLIDATION_SESSION_FINAL_SUMMARY_FEB14.md		CONSOLIDATION_SESSION_FINAL_SUMMARY_FEB14.md
CONSOLIDATION_SESSION_PHASE5_CONTINUATION.md		CONSOLIDATION_SESSION_PHASE5_CONTINUATION.md
CONSOLIDATION_SESSION_SUMMARY.md		CONSOLIDATION_SESSION_SUMMARY.md
CONSOLIDATION_SESSION_SUMMARY_FEB13_2026.md		CONSOLIDATION_SESSION_SUMMARY_FEB13_2026.md
CONSOLIDATION_STATUS_TRACKER.md		CONSOLIDATION_STATUS_TRACKER.md
CURRENT_STATUS.txt		CURRENT_STATUS.txt
CURRENT_STATUS_PHASE5.9.txt		CURRENT_STATUS_PHASE5.9.txt
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
EXECUTIVE_SUMMARY_PHASE3.md		EXECUTIVE_SUMMARY_PHASE3.md
EXECUTIVE_SUMMARY_PHASE5.6.4c.md		EXECUTIVE_SUMMARY_PHASE5.6.4c.md
FINAL_REPORT.md		FINAL_REPORT.md
FINAL_SESSION_STATUS_FEB14_2026.txt		FINAL_SESSION_STATUS_FEB14_2026.txt
FINAL_SESSION_SUMMARY_FEB14.txt		FINAL_SESSION_SUMMARY_FEB14.txt
FINAL_STATUS_GPU_DISPATCH.md		FINAL_STATUS_GPU_DISPATCH.md
FINAL_STATUS_PHASE5_COMPLETE.md		FINAL_STATUS_PHASE5_COMPLETE.md
FINAL_VERIFICATION_SUMMARY.txt		FINAL_VERIFICATION_SUMMARY.txt
FIX_SUMMARY_GPU_WGPU_COMPILATION.md		FIX_SUMMARY_GPU_WGPU_COMPILATION.md
GPU_BACKEND_CONSOLIDATION_PHASE5.2.md		GPU_BACKEND_CONSOLIDATION_PHASE5.2.md
GPU_BACKEND_IMPLEMENTATION_GUIDE_PHASE5.6.3.md		GPU_BACKEND_IMPLEMENTATION_GUIDE_PHASE5.6.3.md
GPU_BACKEND_IMPLEMENTATION_PHASE5_6.md		GPU_BACKEND_IMPLEMENTATION_PHASE5_6.md
GPU_BACKEND_IMPLEMENTATION_STATUS.md		GPU_BACKEND_IMPLEMENTATION_STATUS.md
GPU_BACKEND_IMPLEMENTATION_STRATEGY.md		GPU_BACKEND_IMPLEMENTATION_STRATEGY.md
GPU_BACKEND_INTEGRATION_COMPLETION.md		GPU_BACKEND_INTEGRATION_COMPLETION.md
GPU_BACKEND_INTEGRATION_RESOLUTION_PLAN.md		GPU_BACKEND_INTEGRATION_RESOLUTION_PLAN.md
GPU_BACKEND_QUICK_START_FEB14.md		GPU_BACKEND_QUICK_START_FEB14.md
GPU_BACKEND_QUICK_START_PHASE5.3.md		GPU_BACKEND_QUICK_START_PHASE5.3.md
GPU_COMPONENT_IMPLEMENTATION_PLAN.md		GPU_COMPONENT_IMPLEMENTATION_PLAN.md
GPU_CONSOLIDATION_DIAGNOSTIC_FEB16.md		GPU_CONSOLIDATION_DIAGNOSTIC_FEB16.md
GPU_CONSOLIDATION_IMPLEMENTATION_ROADMAP.md		GPU_CONSOLIDATION_IMPLEMENTATION_ROADMAP.md
GPU_CONSOLIDATION_MIGRATION_GUIDE.md		GPU_CONSOLIDATION_MIGRATION_GUIDE.md

Folders and files

Latest commit

History

Repository files navigation

🦀 RustGPT: Advanced LLM Implementation in Pure Rust

🚀 What This Is

🏗️ Current Architecture

1. Transformer Architecture

2. TRM (Transformer-Recurrent Mixture)

3. Diffusion Models

4. Mamba

5. RG-LRU (Real-Gated Linear Recurrent Units)

6. MoH-RG-LRU (Multi-head RG-LRU with Mixture-of-Heads)

Key Components

🔍 Project Structure

🧪 Training Pipeline

1. Pre-training Phase

2. Instruction Tuning Phase

3. Advanced Features

🚀 Quick Start

🎮 Interactive Mode

💾 Model Persistence

Versioned Serialization with Integrity Checks

Format Options

🧮 Technical Implementation

Current Configuration

Training Details

Advanced Features

Speculative Sampling

Mamba Architecture

RG-LRU Architecture

Diffusion Models

Mixture of Experts

🔧 Development & Testing

Running Tests

Test Coverage

Observability

📊 Dependencies

🤝 Contributing

Current Architecture Options

Areas for Contribution

Getting Started

Code Quality Standards

📈 Project Status

Current Capabilities

Recent Improvements

Roadmap

📚 Learning Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages