A unified PyTorch library implementing 200+ state-of-the-art algorithms across Deep Learning, Reinforcement Learning, Computer Vision, and NLP
Nexus is a comprehensive deep learning library designed for researchers and practitioners who want to:
- Implement cutting-edge research with minimal boilerplate code
- Mix and match components across different domains (e.g., use attention mechanisms from NLP in RL)
- Benchmark algorithms with standardized implementations
- Learn from extensive documentation covering theory, math, and practical implementation
- 200+ Algorithms implemented from recent papers (2018-2025)
- 30,000+ Lines of comprehensive documentation
- Modular Components that can be combined in novel ways
- Production-Ready code with proper testing and error handling
|
🎮 Reinforcement Learning
🧠 Attention Mechanisms
🌊 State Space Models
|
👁️ Computer Vision
💬 NLP & LLMs
🎨 Generative Models
🔧 Training Infrastructure
|
- Efficient Attention: FlashAttention, PagedAttention, MLA (93% KV cache reduction)
- Inference Optimization: Speculative decoding, continuous batching, KV cache quantization
- Memory Efficiency: Gradient checkpointing, activation offloading, mixed precision training
- Distributed Training: FSDP2, ZeRO++, context parallelism for long sequences
Every algorithm includes comprehensive documentation with:
- ✅ Theoretical background - Why it works
- ✅ Mathematical formulation - Complete equations with LaTeX
- ✅ Implementation details - Architecture and hyperparameters
- ✅ Code walkthrough - 3-5 working examples
- ✅ Optimization tricks - 6-8 practical tips
- ✅ Experiments & results - Benchmarks and ablations
- ✅ Common pitfalls - 8-12 debugging solutions
- ✅ References - Papers, implementations, tutorials
pip install nexus-deep-learninggit clone https://github.com/yourusername/nexus.git
cd nexus
pip install -e .# For computer vision
pip install nexus-deep-learning[cv]
# For reinforcement learning
pip install nexus-deep-learning[rl]
# For all features
pip install nexus-deep-learning[all]- Python 3.8+
- PyTorch 2.0+
- CUDA 11.8+ (for GPU acceleration)
from nexus.models.cv import VisionTransformer
from nexus.training import Trainer
# Create model
model = VisionTransformer(config={
"image_size": 224,
"patch_size": 16,
"num_classes": 1000,
"embed_dim": 768,
"num_layers": 12,
"num_heads": 12,
})
# Train
trainer = Trainer(
model=model,
dataset="imagenet",
batch_size=128,
num_epochs=100,
mixed_precision=True,
)
trainer.fit()from nexus.models.rl.policy_gradient import SAC
import gym
# Create environment and agent
env = gym.make("HalfCheetah-v4")
agent = SAC(config={
"state_dim": env.observation_space.shape[0],
"action_dim": env.action_space.shape[0],
"hidden_dim": 256,
"learning_rate": 3e-4,
"gamma": 0.99,
"tau": 0.005,
"alpha": 0.2, # Entropy temperature
})
# Training loop
for episode in range(1000):
state = env.reset()
done = False
while not done:
action = agent.select_action(state)
next_state, reward, done, _ = env.step(action)
agent.store_transition(state, action, reward, next_state, done)
# Update agent
if len(agent.replay_buffer) > agent.batch_size:
metrics = agent.update()
state = next_statefrom nexus.components.attention import FlashAttention3
import torch
# Create attention layer
attention = FlashAttention3(
dim=512,
num_heads=8,
dropout=0.1,
use_fp8=True, # H100 optimization
)
# Forward pass
x = torch.randn(2, 1024, 512).cuda() # [batch, seq_len, dim]
output = attention(x) # 2x faster than FlashAttention-2from nexus.models.rl.alignment import DPO
from transformers import AutoModel
# Load base model
base_model = AutoModel.from_pretrained("meta-llama/Llama-2-7b-hf")
# Create DPO trainer
dpo = DPO(
model=base_model,
beta=0.1, # KL penalty coefficient
learning_rate=1e-6,
)
# Train on preference data
for batch in preference_dataloader:
chosen = batch["chosen"]
rejected = batch["rejected"]
metrics = dpo.update(chosen, rejected)
print(f"Loss: {metrics['loss']:.4f}, Accuracy: {metrics['accuracy']:.2%}")from nexus.models.nlp.rag import SelfRAG
from nexus.models.nlp.retriever import DenseRetriever
# Create retriever and generator
retriever = DenseRetriever(
index_path="wikipedia_embeddings",
top_k=5,
)
self_rag = SelfRAG(
model="meta-llama/Llama-2-7b-hf",
retriever=retriever,
reflection_tokens=["[Retrieval]", "[Relevant]", "[Supported]"],
)
# Generate with self-reflection
query = "What is the capital of France?"
response = self_rag.generate(
query,
max_length=256,
use_reflection=True,
)
print(response)from nexus.components.ssm import Mamba
import torch
# Create Mamba block
mamba = Mamba(
d_model=512,
d_state=16,
d_conv=4,
expand=2,
)
# Forward pass
x = torch.randn(2, 1024, 512) # [batch, seq_len, dim]
output = mamba(x) # O(n) complexity, not O(n²)Comprehensive documentation is available in the docs/ directory:
-
Reinforcement Learning - 50+ RL algorithms
- Value-based methods (DQN, Rainbow, C51)
- Policy gradient (PPO, SAC, TD3)
- Offline RL (IQL, CQL, ReBRAC)
- LLM Alignment (DPO, GRPO, RLVR)
-
Attention Mechanisms - 16+ attention variants
-
State Space Models - Mamba, RWKV, S4, RetNet
-
Hybrid Architectures - Griffin, Jamba, Based
-
Positional Encodings - RoPE, ALiBi, NTK, LongRoPE
-
Architecture Components - MoE, normalization, activations
-
Inference Optimizations - Speculative decoding, KV cache
-
Computer Vision - Detection, segmentation, NeRF, ViTs
-
Generative Models - Diffusion, flow matching, audio/video
-
NLP & LLMs - RAG, PEFT, quantization, reasoning
-
Training Infrastructure - Optimizers, schedules, distributed
-
Self-Supervised Learning - MAE, DINOv2, I-JEPA, VICReg
-
Multimodal Models - LLaVA, Qwen2-VL, NVLM
-
Graph Neural Networks - GPS, Exphormer, GATv2
-
World Models - DreamerV3, Genie, I-JEPA
-
Continual Learning - EVCL, prompt-based CL
-
Autonomous Driving - UniAD, VAD, DriveTransformer
-
Imitation Learning - GAIL, DAgger, AIRL
-
Test-Time Compute - TTT layers, compute-optimal scaling
See RESEARCH_TODO.md for a complete list of 200+ implemented papers with links to arXiv.
nexus/
├── nexus/ # Main library code
│ ├── core/ # Base classes and utilities
│ │ ├── base.py # NexusModule base class
│ │ └── config.py # Configuration management
│ ├── models/ # Model implementations
│ │ ├── rl/ # Reinforcement Learning
│ │ │ ├── value_based/ # DQN, Rainbow, C51, QR-DQN
│ │ │ ├── policy_gradient/ # PPO, SAC, TD3, TRPO
│ │ │ ├── offline/ # IQL, CQL, ReBRAC, IDQL
│ │ │ ├── alignment/ # DPO, GRPO, KTO, SimPO
│ │ │ ├── multi_agent/ # MAPPO, QMIX, MADDPG
│ │ │ ├── model_based/ # DreamerV3, TD-MPC2
│ │ │ ├── exploration/ # ICM, RND, Go-Explore
│ │ │ ├── sequence/ # Decision Transformer
│ │ │ ├── reward_models/ # PRM, ORM, Generative RM
│ │ │ └── planning/ # MCTS, AlphaZero
│ │ ├── cv/ # Computer Vision
│ │ │ ├── detection/ # DETR, RT-DETR, YOLO-World
│ │ │ ├── segmentation/ # SAM, SAM 2, MedSAM
│ │ │ └── nerf/ # NeRF, Gaussian Splatting
│ │ ├── nlp/ # NLP & LLMs
│ │ │ ├── reasoning/ # CoT, ToT, GoT, ReAct
│ │ │ ├── rag/ # Self-RAG, CRAG, GraphRAG
│ │ │ └── structured/ # Grammar-constrained decoding
│ │ ├── generative/ # Generative Models
│ │ │ ├── diffusion/ # DiT, SD3, FLUX
│ │ │ └── audio_video/ # VALLE, Voicebox
│ │ └── compression/ # Model Compression
│ │ ├── peft/ # LoRA, QLoRA, DoRA, GaLore
│ │ ├── quantization/ # GPTQ, AWQ, QuIP#
│ │ ├── pruning/ # SparseGPT, Wanda, SliceGPT
│ │ └── distillation/ # Knowledge distillation
│ ├── components/ # Reusable Components
│ │ ├── attention/ # Attention mechanisms
│ │ ├── ssm/ # State space models
│ │ ├── moe/ # Mixture of experts
│ │ ├── normalization/ # LayerNorm, RMSNorm
│ │ └── activation/ # GELU, SwiGLU, etc.
│ ├── training/ # Training Infrastructure
│ │ ├── optimizers/ # Sophia, Prodigy, SOAP, Muon
│ │ ├── schedules/ # WSD, Cosine Restarts
│ │ ├── mixed_precision/ # FP8, MXFP8, FP4
│ │ └── distributed/ # FSDP2, ZeRO++
│ └── utils/ # Utilities
│ ├── inference/ # Inference optimizations
│ ├── data/ # Data pipelines
│ └── metrics/ # Evaluation metrics
├── configs/ # Configuration files
├── docs/ # Comprehensive documentation
├── tests/ # Unit tests
├── examples/ # Usage examples
├── .claude/ # Claude Code skills
│ ├── add-module.md # Skill for adding modules
│ ├── add-docs.md # Skill for documentation
│ └── QUICK_REFERENCE.md # Quick reference guide
├── RESEARCH_TODO.md # Implemented papers list
└── README.md # This file
Complete examples are available in the examples/ directory:
examples/rl/train_sac.py- SAC on continuous control tasksexamples/rl/train_ppo.py- PPO on Atari and MuJoCoexamples/rl/offline_rl_d4rl.py- Offline RL on D4RL benchmarksexamples/rl/alignment_dpo.py- LLM alignment with DPO
examples/cv/train_vit.py- Vision Transformer on ImageNetexamples/cv/object_detection.py- DETR for object detectionexamples/cv/segment_anything.py- SAM for zero-shot segmentationexamples/cv/gaussian_splatting.py- 3D reconstruction
examples/nlp/self_rag.py- Self-reflective RAGexamples/nlp/lora_finetuning.py- LoRA fine-tuningexamples/nlp/quantization_gptq.py- Model quantizationexamples/nlp/structured_generation.py- JSON schema generation
examples/generative/train_dit.py- Diffusion Transformer trainingexamples/generative/flow_matching.py- Flow matching for generation
Nexus provides skills for quickly adding new algorithms:
- Add Implementation: Use
/add-moduleskill or follow .claude/add-module.md - Add Documentation: Use
/add-docsskill or follow .claude/add-docs.md - See Quick Reference: .claude/QUICK_REFERENCE.md
All models extend NexusModule:
from nexus.core.base import NexusModule
import torch
class MyAlgorithm(NexusModule):
"""
My Algorithm Implementation
Paper: Title (Year)
Link: https://arxiv.org/abs/XXXX.XXXXX
"""
def __init__(self, config: dict):
super().__init__(config)
# Initialize components
def forward(self, x: torch.Tensor) -> torch.Tensor:
# Forward pass
pass
def compute_loss(self, batch: dict) -> torch.Tensor:
# Loss computation
pass
def update(self, batch: dict) -> dict:
# Training step
loss = self.compute_loss(batch)
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
return {'loss': loss.item()}Run tests with pytest:
# Run all tests
pytest
# Run specific test file
pytest tests/test_sac.py
# Run with coverage
pytest --cov=nexus --cov-report=htmlPerformance benchmarks are included in documentation for each algorithm. Key highlights:
| Algorithm | Task | Performance | Reference |
|---|---|---|---|
| SAC | HalfCheetah-v4 | 15,000+ reward | docs |
| PPO | Atari (26 games) | 199% human | docs |
| DPO | MT-Bench | 7.09 score | docs |
| FlashAttention-3 | H100 | 2x speedup | docs |
| Mamba-2 | Language modeling | 2-8x faster | docs |
| SAM 2 | Video segmentation | 93.0 J&F | docs |
We welcome contributions! Please see our contributing guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Implement your changes following existing patterns
- Add tests and documentation
- Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Code Style: Follow PEP 8 and use type hints
- Documentation: Add comprehensive docs following the 10-section template
- Tests: Include unit tests with >80% coverage
- Commit Messages: Use clear, descriptive messages
This project is licensed under the MIT License - see the LICENSE file for details.
Nexus builds upon the incredible work of the deep learning research community. We acknowledge:
- PyTorch Team - For the foundational framework
- Research Authors - For the 200+ papers implemented here
- Open Source Community - For reference implementations and feedback
This library implements algorithms from leading conferences:
- NeurIPS, ICML, ICLR (Machine Learning)
- CVPR, ICCV, ECCV (Computer Vision)
- ACL, EMNLP, NAACL (NLP)
- CoRL, RSS (Robotics)
See RESEARCH_TODO.md for the complete list with citations.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs/README.md
If you find Nexus useful, please consider starring the repository!
- 200+ Algorithms from papers (2018-2025)
- 30,000+ Lines of documentation
- 17,000+ Lines of implementation code
- 100+ Test Cases with >80% coverage
- 20 Research Domains covered
Built with ❤️ by the research community