ML-Reading-Group

we are not bound to the time duration but vibes

Iteratable Discussions

CUDA:
- Programming Massive Parallel Systems
- CUDA Core Compute Libraries (Thrust, CUB, libcudacxx)
- Multi-GPU programming, NCCL
CUTLASS & CUTE

Flash Attention (1&2)
Distributed Data Parallelism
Tensor Parallelism
Pipeline Parallelism
Context Parallelism
Fully Sharded Data Parallelism
DeepSpeed Zero (1, 2 and 3)
Sequence Parallelism: Long Sequence Training from System Perspective
Blockwise Parallel Transformer for Large Context Models
Ring Attention with Blockwise Transformers for Near-Infinite Context Length
Efficient Memory Management for Large Language Model Serving with PagedAttention
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
PipeDream: Fast and Efficient Pipeline Parallel DNN Training
Zero Bubble Pipeline Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md