FinRL_Crypto implements a comprehensive suite of Deep Reinforcement Learning (DRL) agents specifically optimized for cryptocurrency trading. The agent architecture follows the ElegantRL framework, providing both continuous and discrete action spaces for various trading strategies.
AgentBase - The foundational class that provides:
- Environment exploration (single and vectorized)
- Trajectory conversion and replay buffer management
- Network optimization utilities
- Model persistence (save/load)
- Prioritized Experience Replay (PER) support
AgentPPO - State-of-the-art policy gradient algorithm
- AgentDiscretePPO: For discrete action spaces (buy/sell/hold)
- AgentSharePPO: Shared parameter version for pixel-level states
- Features: GAE (Generalized Advantage Estimation), KL divergence clipping
AgentA2C - Synchronous advantage actor-critic
- AgentDiscreteA2C: Discrete action variant
- AgentShareA2C: Shared parameter implementation
- Built on PPO foundation with simplified updates
Available in net.py:
- QNet: Standard Deep Q-Network
- QNetDuel: Dueling DQN (state/value separation)
- QNetTwin: Double DQN (reduces overestimation)
- QNetTwinDuel: Dueling Double DQN (D3QN)
AgentDDPG - Continuous control with deterministic policies
- Suitable for continuous trading actions (position sizing)
AgentTD3 - Improved DDPG with:
- Twin critics for stability
- Delayed policy updates
- Target policy smoothing
AgentSAC - State-of-the-art off-policy algorithm
- AgentModSAC: Modified version with TTUR (Two Time-scale Update Rule)
- AgentShareSAC: Shared parameter implementation
- Features: Entropy regularization, automatic temperature tuning
- Actor: Standard deterministic policy network
- ActorSAC: Stochastic policy with reparameterization
- ActorPPO: Policy network with log-probability computation
- ActorDiscretePPO: Discrete action policy for crypto trading
- Critic: Standard Q-value network
- CriticPPO: Value function for PPO
- CriticTwin: Double Q-networks for stability
- CriticREDQ: Randomized Ensemble Double Q-learning
- SharePPO: Combined actor-critic for image-based states
- ShareSPG: Stochastic policy gradient with shared parameters
- Continuous: Precise position sizing (0-100% portfolio allocation)
- Discrete: Buy/Sell/Hold decisions
- Multi-action: Simultaneous trading across multiple cryptocurrencies
- Prioritized Experience Replay (PER)
- Generalized Advantage Estimation (GAE)
- Vectorized environment support
- GPU acceleration
- Gradient clipping and normalization
- Portfolio optimization
- Risk management through entropy regularization
- Multi-asset trading strategies
- High-frequency trading support
- Continuous Trading: Use SAC, TD3, or DDPG for position sizing
- Discrete Decisions: Use PPO or A2C for buy/sell/hold signals
- Multi-asset: Use vectorized environments with shared networks
- Research: Experiment with different network architectures in net.py
All agents support standard hyperparameters:
net_dim: Network hidden layer dimensionsstate_dim: Environment state space dimensionalityaction_dim: Action space dimensionalitylearning_rate: Optimizer learning rategamma: Discount factorgpu_id: CUDA device selection
The agents integrate seamlessly with: