.oooooo. .
d8P' `Y8b .o8
888 888 .ooooo. .o888oo .oooo. ooo. .oo. .ooooo.
888 888 d88' `"Y8 888 `P )88b `888P"Y88b d88' `88b
888 888 888 888 .oP"888 888 888 888ooo888
`88b d88' 888 .o8 888 . d8( 888 888 888 888 .o
`Y8bood8P' `Y8bod8P' "888" `Y888""8o o888o o888o `Y8bod8P'
β β β β β β β β β β
- 1000x faster than Python SB3 - Pure Rust eliminates GIL and dynamic typing overhead
- Cross-platform SIMD - ARM NEON (Apple Silicon) + AVX2/AVX-512 (x86_64)
- Metal & CUDA GPU support - Native acceleration for Apple M-series and NVIDIA GPUs
- 10 RL algorithms - PPO, A2C, SAC, TD3, DDPG, DQN, PPG, REDQ, CQL, IQN
- Advanced experience replay - HER, N-step, PER with Segment Trees, memory-mapped buffers
- Transformer architectures - Decision Transformer, Multi-Head Attention, RMSNorm
- Distributed training - Multi-worker gradient aggregation with gRPC
- Mixed precision - FP16/BF16 with automatic gradient scaling
- Gym compatibility - Python Gymnasium integration via PyO3
- Rich observability - TensorBoard, Weights & Biases, hierarchical profiling
All benchmarks performed on Apple M4 Max comparing Octane vs Python Stable-Baselines3.
| Steps | Octane | SB3 (Python) | Speedup |
|---|---|---|---|
| 500K | ~0.6s | ~600s | 1000x |
| 5M | ~5.6s | ~6000s | 1071x |
| Metric | Octane | SB3 (Python) |
|---|---|---|
| FPS | 800,000 - 1,800,000 | ~833 |
| Environments | Time per Step | Throughput |
|---|---|---|
| 1 | 907ns | 1.1M steps/s |
| 8 | 27Β΅s | 296K steps/s |
| 32 | 64Β΅s | 500K steps/s |
| 128 | 166Β΅s | 771K steps/s |
| 512 | 500Β΅s | 1.02M steps/s |
| 1024 | 792Β΅s | 1.29M steps/s |
| Operation | Size | Time |
|---|---|---|
| PPO Loss | 64 batch | 1.87Β΅s |
| PPO Loss | 1024 batch | 4.74Β΅s |
| MLP Forward | 32x64 | 22.9Β΅s |
| MLP Forward | 512x512 | 5.2ms |
| Advantage Norm | 16384 | 31.2Β΅s |
Up to 30x speedup with Metal GPU on Apple M-series chips.
| Operation | CPU | Metal GPU | Speedup |
|---|---|---|---|
| MatMul 128x128 | 157Β΅s | 5.2Β΅s | 30x |
| MatMul 512x512 | 1.1ms | 107Β΅s | 10x |
| MatMul 1024x1024 | 6.6ms | 912Β΅s | 7.2x |
| MatMul 2048x2048 | 47.6ms | 6.4ms | 7.4x |
| Policy Inference (batch 512) | 1.03ms | 130Β΅s | 7.9x |
| Policy Inference (batch 2048) | 3.2ms | 534Β΅s | 6.0x |
| Policy Inference (batch 4096) | 6.1ms | 1.1ms | 5.5x |
| MLP Forward (256x128) | 471Β΅s | 142Β΅s | 3.3x |
| Softmax 256x1024 | 882Β΅s | 386Β΅s | 2.3x |
| Softmax 512x2048 | 3.5ms | 1.7ms | 2.0x |
Add Octane to your project:
cargo add octane-rsBasic usage example:
use octane_rs::prelude::*;
fn main() -> octane_rs::Result<()> {
// Select device (CPU, Metal, or CUDA)
let device = Device::cpu();
// Create and vectorize environment
let env = TradingEnv::default();
let vec_env = VecEnv::new(vec![env; 128])?;
// Configure PPO algorithm
let config = PPOConfig::default()
.learning_rate(3e-4)
.n_steps(2048)
.batch_size(64)
.gamma(0.99);
// Create agent and train
let mut agent = PPOAgent::new(config, &device)?;
// Train with checkpointing
let checkpoint_mgr = CheckpointManager::new("checkpoints/")?;
for step in 0..1_000_000 {
let metrics = agent.train_step(&vec_env)?;
if step % 10000 == 0 {
checkpoint_mgr.save(&agent, step, metrics.mean_reward)?;
println!("Step {} | Reward: {:.2}", step, metrics.mean_reward);
}
}
Ok(())
}[dependencies]
# Default (CPU only)
octane-rs = "0.1"
# Apple Silicon GPU (Metal)
octane-rs = { version = "0.1", features = ["metal"] }
# ARM NEON SIMD (Apple Silicon)
octane-rs = { version = "0.1", features = ["simd"] }
# x86_64 SIMD optimizations
octane-rs = { version = "0.1", features = ["avx2"] } # AVX2
octane-rs = { version = "0.1", features = ["avx512"] } # AVX-512
# NVIDIA GPU (CUDA)
octane-rs = { version = "0.1", features = ["cuda"] }
# Python Gym/Gymnasium compatibility
octane-rs = { version = "0.1", features = ["gym"] }
# Weights & Biases logging
octane-rs = { version = "0.1", features = ["wandb"] }
# Distributed training
octane-rs = { version = "0.1", features = ["distributed"] }
# Mixed precision (FP16/BF16)
octane-rs = { version = "0.1", features = ["half"] }
# Full (all features)
octane-rs = { version = "0.1", features = ["full"] }git clone https://github.com/lubluniky/octane-rs
cd octane-rs
# CPU only
cargo build --release
# Apple Silicon with Metal + SIMD
cargo build --release --features metal,simd
# x86_64 with AVX-512
cargo build --release --features avx512
# Full build
cargo build --release --features full| Algorithm | Type | Action Space | Best For |
|---|---|---|---|
| PPO | On-policy | Discrete/Continuous | General purpose, stable training |
| A2C | On-policy | Discrete/Continuous | Fast environments, simple tasks |
| PPG | On-policy | Discrete/Continuous | Sample efficiency + stability |
| SAC | Off-policy | Continuous | Maximum entropy, sample efficient |
| TD3 | Off-policy | Continuous | Continuous control, robotics |
| DDPG | Off-policy | Continuous | Deterministic policies |
| DQN | Off-policy | Discrete | Games, discrete action spaces |
| REDQ | Off-policy | Continuous | High UTD ratio (20), ensemble Q |
| CQL | Off-policy | Continuous | Offline RL, conservative Q |
| IQN | Off-policy | Discrete | Distributional RL, risk-sensitive |
PPG (Phasic Policy Gradient)
let config = PPGConfig::default()
.policy_epochs(32)
.aux_epochs(6)
.beta_clone(1.0);REDQ (Randomized Ensemble Double Q)
let config = REDQConfig::default()
.n_critics(10) // 10 Q-networks
.utd_ratio(20) // 20 gradient updates per env step
.in_target_minimization(2);CQL (Conservative Q-Learning)
let config = CQLConfig::default()
.cql_alpha(5.0) // Conservative penalty weight
.with_lagrange(true) // Auto-tune alpha
.target_action_gap(10.0);IQN (Implicit Quantile Networks)
let config = IQNConfig::default()
.n_quantiles(64)
.risk_measure(RiskMeasure::CVaR { alpha: 0.25 });| Buffer | Use Case | Features |
|---|---|---|
| RolloutBuffer | On-policy (PPO, A2C) | GAE computation, SIMD optimized |
| ReplayBuffer | Off-policy (SAC, TD3) | Uniform sampling, configurable |
| PrioritizedReplayBuffer | DQN, Rainbow | Segment Tree O(log n), importance sampling |
| HERBuffer | Goal-conditioned | Final/Future/Episode/Random strategies |
| NStepBuffer | TD3, DQN | N-step returns, configurable n |
| MmapBuffer | Large-scale | Memory-mapped, 100M+ transitions |
use octane_rs::buffer::{HERBuffer, HERConfig, GoalSelectionStrategy};
let config = HERConfig::default()
.strategy(GoalSelectionStrategy::Future { k: 4 })
.reward_fn(|achieved, desired| {
if (achieved - desired).norm() < 0.05 { 0.0 } else { -1.0 }
});
let her_buffer = HERBuffer::new(100_000, config);use octane_rs::buffer::MmapReplayBuffer;
// Store 100M transitions on disk, memory-efficient
let buffer = MmapReplayBuffer::new(
"experience.mmap",
100_000_000,
obs_shape,
action_shape,
)?;| Network | Description | Use Case |
|---|---|---|
| MLP | Multi-layer perceptron | Standard RL |
| LSTM | Long short-term memory | Sequence modeling |
| GRU | Gated recurrent unit | Efficient RNNs |
| ActorCritic | Combined policy-value | PPO, A2C |
| TransformerEncoder | Self-attention layers | Decision Transformer |
| AttentionActorCritic | Attention-based AC | Complex observations |
| DecisionTransformer | Transformer for RL | Offline RL, sequence modeling |
use octane_rs::networks::{LayerNorm, RMSNorm, BatchNorm};
// RMSNorm (faster, no mean computation)
let norm = RMSNorm::new(hidden_size, eps, &device)?;
// LayerNorm with learnable affine
let norm = LayerNorm::new(hidden_size, eps, true, &device)?;use octane_rs::networks::init::{orthogonal_init, xavier_uniform, kaiming_normal};
// Orthogonal initialization (recommended for RL)
let weight = orthogonal_init((in_features, out_features), gain, &device)?;
// Xavier for tanh activations
let weight = xavier_uniform((in_features, out_features), &device)?;
// Kaiming for ReLU
let weight = kaiming_normal((in_features, out_features), &device)?;use octane_rs::envs::GymEnv;
// Connect to Python Gymnasium
let env = GymEnv::new("Humanoid-v4")?;
let obs = env.reset()?;
let (next_obs, reward, terminated, truncated, info) = env.step(&action)?;use octane_rs::envs::wrappers::*;
let env = TradingEnv::new(config);
let env = FrameStack::new(env, 4); // Stack last 4 frames
let env = NormalizeObservation::new(env); // Running mean/std normalization
let env = NormalizeReward::new(env, 0.99); // Reward normalization
let env = ClipAction::new(env, -1.0, 1.0); // Clip continuous actions
let env = TimeLimit::new(env, 1000); // Episode time limituse octane_rs::envs::{MultiAgentEnv, CTDEWrapper};
// Centralized Training, Decentralized Execution
let env = MultiAgentTradingEnv::new(n_agents);
let ctde = CTDEWrapper::new(env);
// Global state for critic, local observations for actors
let (global_state, local_obs) = ctde.get_states()?;| Platform | Instruction Set | Operations |
|---|---|---|
| Apple Silicon | ARM NEON | GAE, Gaussian, Softmax, Gather |
| x86_64 | AVX2 | GAE, TD-error, Log-prob, Softmax |
| x86_64 | AVX-512 | All AVX2 + wider vectors |
use octane_rs::simd;
// Vectorized GAE computation (4-8x speedup)
simd::compute_gae_simd(&rewards, &values, &dones, gamma, lambda, &mut advantages);
// SIMD TD-error for off-policy
simd::compute_td_error_simd(&rewards, &next_q, ¤t_q, gamma, &dones, &mut td_errors);
// Vectorized Gaussian log-probability
simd::gaussian_log_prob_simd(&actions, &means, &log_stds, &mut log_probs);
// SIMD softmax
simd::softmax_simd(&logits, &mut probs);use octane_rs::distributed::{DistributedConfig, DistributedCoordinator, WorkerPool};
let config = DistributedConfig::default()
.n_workers(8)
.sync_mode(SyncMode::Synchronous)
.gradient_compression(true);
let coordinator = DistributedCoordinator::new(config)?;
let worker_pool = WorkerPool::new(8)?;
// Distributed PPO training
coordinator.train_distributed(&mut agent, &worker_pool, total_steps)?;use octane_rs::distributed::{GradientAggregator, ReduceOp};
let aggregator = GradientAggregator::new(n_workers);
aggregator.all_reduce(&mut gradients, ReduceOp::Mean)?;use octane_rs::core::{Precision, GradScaler, AutocastContext};
// Configure mixed precision
let scaler = GradScaler::new(
Precision::FP16,
initial_scale: 65536.0,
growth_factor: 2.0,
backoff_factor: 0.5,
);
// Autocast context for automatic precision
let autocast = AutocastContext::new(Precision::BF16);
autocast.run(|| {
let loss = model.forward(&batch)?;
scaler.scale(&loss).backward()?;
scaler.step(&mut optimizer)?;
scaler.update();
Ok(())
})?;use octane_rs::checkpoint::{CheckpointManager, TrainingResumer};
let checkpoint_mgr = CheckpointManager::new("checkpoints/")
.keep_last(5)
.save_best(true, BestMetric::MeanReward);
// Save checkpoint
checkpoint_mgr.save(&agent, step, metrics)?;
// Resume training
let resumer = TrainingResumer::new("checkpoints/")?;
let (agent, start_step) = resumer.resume_or_new(|| PPOAgent::new(config, &device))?;use octane_rs::tuning::{HyperparameterSpace, RandomSearch, Study};
let space = HyperparameterSpace::new()
.add_float("learning_rate", 1e-5, 1e-3, true) // log scale
.add_float("gamma", 0.95, 0.999, false)
.add_int("n_steps", 128, 4096)
.add_categorical("activation", &["relu", "tanh", "gelu"]);
let study = Study::new("ppo_tuning", OptimizationDirection::Maximize);
let search = RandomSearch::new(space, n_trials: 100);
search.optimize(&study, |trial| {
let lr = trial.suggest_float("learning_rate")?;
let config = PPOConfig::default().learning_rate(lr);
let reward = train_and_evaluate(config)?;
Ok(reward)
})?;use octane_rs::logging::TensorBoardWriter;
let writer = TensorBoardWriter::new("runs/experiment_1")?;
writer.add_scalar("reward/mean", mean_reward, step)?;
writer.add_histogram("policy/actions", &actions, step)?;use octane_rs::logging::{WandbLogger, WandbConfig};
let config = WandbConfig::new("project_name")
.entity("team")
.tags(&["ppo", "trading"]);
let logger = WandbLogger::new(config)?;
logger.log(step, &metrics)?;use octane_rs::profiling::{Profiler, ProfileScope, global_profiler};
// Hierarchical profiling
{
let _scope = ProfileScope::new("train_step");
{
let _forward = ProfileScope::new("forward_pass");
// ... forward pass
}
{
let _backward = ProfileScope::new("backward_pass");
// ... backward pass
}
}
// Print report
global_profiler().print_report();octane/
βββ src/
β βββ core/ # Device, precision, error handling
β β βββ device.rs # CPU/Metal/CUDA abstraction
β β βββ precision.rs # FP16/BF16, GradScaler
β β βββ error.rs # OctaneError enum
β βββ envs/ # Environments
β β βββ traits.rs # Environment trait
β β βββ vec_env.rs # Parallel VecEnv
β β βββ gym.rs # Python Gym wrapper
β β βββ multiagent.rs # Multi-agent support
β β βββ wrappers.rs # FrameStack, Normalize, etc.
β βββ networks/ # Neural architectures
β β βββ mlp.rs # MLP, ActorCritic
β β βββ recurrent.rs # LSTM, GRU
β β βββ transformer.rs # TransformerEncoder, DecisionTransformer
β β βββ attention.rs # Multi-head attention
β β βββ normalization.rs # LayerNorm, RMSNorm, BatchNorm
β β βββ init.rs # Weight initialization
β βββ distributions/ # Action distributions
β β βββ mod.rs # Categorical, Gaussian, Squashed
β βββ buffer/ # Experience storage
β β βββ rollout.rs # On-policy buffer
β β βββ replay.rs # Off-policy buffer
β β βββ her.rs # Hindsight Experience Replay
β β βββ nstep.rs # N-step returns
β β βββ mmap.rs # Memory-mapped buffer
β β βββ segment_tree.rs # PER with SumTree/MinTree
β βββ algorithms/ # RL algorithms
β β βββ ppo.rs, a2c.rs, ppg.rs # On-policy
β β βββ sac.rs, td3.rs, ddpg.rs # Off-policy continuous
β β βββ dqn.rs, iqn.rs # Off-policy discrete
β β βββ cql.rs, redq.rs # Advanced off-policy
β β βββ traits.rs # Agent trait
β βββ simd/ # SIMD optimizations
β β βββ neon.rs # ARM NEON (Apple Silicon)
β β βββ x86.rs # AVX2/AVX-512
β β βββ td_error.rs # SIMD TD computation
β β βββ log_prob.rs # SIMD log probability
β βββ distributed/ # Distributed training
β β βββ mod.rs # WorkerPool, GradientAggregator
β βββ checkpoint/ # Model persistence
β β βββ mod.rs # CheckpointManager, TrainingResumer
β βββ logging/ # Observability
β β βββ metrics.rs # MetricLogger trait
β β βββ tensorboard.rs # TensorBoard writer
β β βββ wandb.rs # W&B integration
β βββ profiling/ # Performance profiling
β β βββ mod.rs # Profiler, ProfileScope
β βββ tuning/ # Hyperparameter optimization
β β βββ mod.rs # Study, RandomSearch, GridSearch
β βββ tui/ # Terminal UI
β β βββ mod.rs # Training visualization
β β
β β # βββ TRADING-SPECIFIC MODULES βββ
β β
β βββ trading/ # Advanced trading environments
β β βββ env.rs # Order book, slippage, commissions
β β βββ multi_asset.rs # Portfolio of N assets
β β βββ multi_timeframe.rs # M1/M5/H1/D1 support
β β βββ regime.rs # HMM regime detection, GARCH
β βββ risk/ # Risk management
β β βββ constraints.rs # Safe RL, action masking
β β βββ rewards.rs # Sharpe/Sortino/Calmar shaping
β β βββ position_sizing.rs # Kelly criterion, ATR
β β βββ drawdown.rs # Max DD limits, recovery mode
β βββ metrics/ # Trading analytics
β β βββ trading.rs # VaR, CVaR, Sharpe, Win Rate
β β βββ journal.rs # Trade logging, attribution
β β βββ attribution.rs # P&L breakdown
β βββ backtesting/ # Validation
β β βββ walk_forward.rs # Walk-forward optimization
β β βββ monte_carlo.rs # Stress testing, bootstrap
β β βββ cross_validation.rs # Purged K-Fold, embargo
β βββ live/ # Live trading
β β βββ paper.rs # Paper trading engine
β β βββ exchanges/ # Binance, Bybit connectors
β β βββ execution.rs # TWAP, VWAP, Iceberg
β β βββ monitor.rs # Real-time P&L, alerts
β βββ strategies/ # Advanced RL
β βββ ensemble.rs # Multi-agent voting
β βββ hierarchical.rs # Two-level RL
β βββ meta.rs # MAML adaptation
β βββ imitation.rs # Behavioral cloning
βββ benches/ # Criterion benchmarks
Octane includes comprehensive trading-specific infrastructure:
use octane_rs::trading::{AdvancedTradingEnv, AdvancedTradingConfig, SlippageModel};
let config = AdvancedTradingConfig::default()
.slippage_model(SlippageModel::AlmgrenChriss {
temporary_impact: 0.1,
permanent_impact: 0.01
})
.enable_partial_fills(true)
.latency_ms(50);
let env = AdvancedTradingEnv::new(config, market_data)?;use octane_rs::risk::{DrawdownController, DrawdownConfig, PositionSizer};
// Drawdown control with recovery mode
let controller = DrawdownController::new(
DrawdownConfig::default()
.max_drawdown(0.15) // 15% max drawdown
.recovery_threshold(0.10) // Enter recovery at 10%
.recovery_risk_factor(0.5) // Halve risk in recovery
);
// Kelly criterion position sizing
let sizer = PositionSizer::new(PositionSizingConfig::default()
.method(SizingMethod::HalfKelly));use octane_rs::backtesting::{WalkForwardOptimizer, MonteCarloSimulator, CrossValidator};
// Walk-forward optimization
let wfo = WalkForwardOptimizer::new(WalkForwardConfig::default()
.train_size(252) // 1 year train
.test_size(63) // 3 months test
.step_size(21)); // Monthly rolling
// Monte Carlo stress testing
let mc = MonteCarloSimulator::new(MonteCarloConfig::default()
.n_simulations(10_000)
.stress_scenarios(vec![
StressScenario::FlashCrash,
StressScenario::VolatilitySpike,
]));
// Purged cross-validation (prevents lookahead bias)
let cv = CrossValidator::new(CVConfig::default()
.method(CVMethod::PurgedKFold { n_splits: 5, purge_gap: 5, embargo: 10 }));use octane_rs::live::{PaperTradingEngine, ExecutionEngine, ExecutionAlgorithm};
// Paper trading with realistic simulation
let paper = PaperTradingEngine::new(PaperTradingConfig::default()
.slippage_model(SlippageModel::VolumeWeighted)
.initial_balance(100_000.0));
// Smart execution
let executor = ExecutionEngine::new(ExecutionConfig::default()
.algorithm(ExecutionAlgorithm::TWAP { duration_secs: 300 }));use octane_rs::strategies::{EnsembleAgent, HierarchicalAgent, AdaptiveAgent};
// Ensemble of agents with voting
let ensemble = EnsembleAgent::new(EnsembleConfig::default()
.voting_strategy(VotingStrategy::Boosting)
.weight_adaptation(WeightAdaptation::UCB1));
// Hierarchical RL (timing + execution)
let hierarchical = HierarchicalAgent::new(HierarchicalConfig::default()
.high_level_interval(100) // Decide every 100 steps
.options(vec![TradingOption::Hold, TradingOption::AggressiveLong, ...]));
// Meta-learning for regime adaptation
let adaptive = AdaptiveAgent::new(MetaLearningConfig::default()
.strategy(AdaptationStrategy::RegimeAware)
.adaptation_steps(10));| Feature | Octane | Stable-Baselines3 | RLlib |
|---|---|---|---|
| Language | Rust | Python | Python |
| Throughput | 1.8M FPS | 833 FPS | ~2K FPS |
| SIMD | NEON + AVX2/512 | NumPy | NumPy |
| GPU | Metal + CUDA | CUDA | CUDA |
| Distributed | Native gRPC | Ray | Ray |
| Algorithms | 10 | 7 | 15+ |
| Memory-mapped | Yes | No | Yes |
| Mixed Precision | FP16/BF16 | No | FP16 |
Octane is licensed under the GNU General Public License v2.0.
Built with Rust for maximum performance
~57,000 lines of high-performance RL code









