Skip to content

v2.0.0

Choose a tag to compare

@aws-tianquaw aws-tianquaw released this 03 Dec 15:37
· 20 commits to main since this release
fa19961

Release Notes - v2.0.0

Announcing major updates to Amazon SageMaker HyperPod recipes (v2.0.0)

Key Features

Expanded Model Support

  • Llama: 3, 3.1, 3.2, 3.3 (1B to 70B)
  • DeepSeek: R1 Distilled Llama (8B, 70B), R1 Distilled Qwen (1.5B - 32B)
  • GPT-OSS: 20B, 120B
  • Qwen: 2.5 (0.5B - 72B), 3 (0.6B - 32B)

Expanded training techniques

Technique Available Methods
Supervised Fine-Tuning (SFT) LoRA, Full Fine-Tuning
Direct Preference Optimization (DPO) LoRA, Full Fine-Tuning
Reinforcement Learning from AI Feedback (RLAIF) LoRA, Full Fine-Tuning
Reinforcement Learning with Verifiable Rewards (RLVR) LoRA, Full Fine-Tuning

Support for new training frameworks and techniques

  • LLMFT: Advanced fine-tuning support for SFT and DPO with LoRA
  • VERL: Reinforcement learning support using GRPO algorithm for RLVR, RLAIF
  • Checkpointless training: Memory-efficient training for large models
  • Elastic training: Dynamic resource scaling capabilities

Infrastructure support

  • NVIDIA H100, A100, and A10G accelerators
  • Built-in logging support (TensorBoard, MLflow)
  • Choice of training infrastructure across SageMaker training jobs, SageMaker HyperPod with Amazon EKS, and Slurm

Documentation

  • Refer to the README.md for detailed usage instructions and examples
  • Refer to recipes_collection for updated recipe collection
  • Refer to launcher_scripts for launcher script examples

Contributing

We welcome contributions to enhance the capabilities of sagemaker-hyperpod-recipes. Please refer to our contributing guidelines for more information.

Thank you for choosing sagemaker-hyperpod-recipes for your model training!