A comprehensive repository showcasing production-ready Generative AI workflows on Amazon SageMaker. This collection provides end-to-end implementations spanning the complete ML lifecycle, from foundational concepts to enterprise-scale deployments, covering model training, fine-tuning, inference optimization, MLOps automation, distributed training, RAG systems, intelligent agents, and real-world industry applications.
New to Generative AI on SageMaker? Start here:
- Getting Started Guide - Essential setup, foundational concepts, and first steps
This repository supports a comprehensive range of foundation models with various training methodologies. The table below shows model compatibility with different fine-tuning techniques, training frameworks, and deployment options.
Models - Size | Use Case / Strategy | Notebook | Service | Frameworks & Libs |
---|---|---|---|---|
Qwen 3 0.6B | Function Calling, Agentic AI (FSDP, SFT, QLoRA) | Notebook | SageMaker AI Training Jobs | Transformers, Accelerate, SageMaker Model Trainer, MLflow, Weights & Biases |
Qwen 3 0.6B | Function Calling, Agentic AI (LoRA, DPO) | Notebook | SageMaker AI Training Jobs | Transformers, Accelerate, SageMaker Model Trainer, MLflow, Weights & Biases |
Arcee-Lite | Reasoning (FSDP, QLoRA, GRPO) | Notebook | SageMaker AI Training Jobs | Transformers, Accelerate, SageMaker Model Trainer, MLflow, Weights & Biases |
Qwen 3 0.6B | Reasoning (FSDP, SFT, LoRA) | Notebook | SageMaker AI Training Jobs | Ray, Grafana, Prometheus, Transformers, SageMaker Model Trainer |
Qwen 3 0.6B | Reasoning (FSDP, SFT, LoRA) | Notebook | SageMaker AI Training Jobs | Heterogeneous Cluster, Ray, Grafana, Prometheus, SageMaker Estimator |
DeepSeek-R1-Distill-Llama-8B | Reasoning (SFT, QLoRA) | Notebook | SageMaker AI Training Jobs | Transformers, Accelerate, SageMaker Model Trainer, MLflow |
GTE-Base-En-V1.5 | Embeddings | Notebook | SageMaker AI Training Jobs | Sentence Transformers, Accelerate, SageMaker Estimator |
Qwen 2 0.5B Instruct | Summarization (GRPO) | Notebook | SageMaker AI Training Jobs | Accelerate, Datasets, SageMaker, Transformers, TRL, Weights & Biases |
Gemma 3 4B-It | Conversations, Reasoning (LoRA) | Notebook | SageMaker AI Training Jobs | Torch, TorchVision, TorchAudio, Unsloth, Psutil |
Qwen 2 7B | Reasoning (GRPO) | Notebook | SageMaker AI Training Jobs | Verl, Torch, vLLM, FlashAttention |
Qwen 3 8B | Conversations (Spectrum) | Notebook | SageMaker AI Training Jobs | Transformers, Accelerate, SageMaker Model Trainer, MLflow, Weights & Biases |
Meta LLaMA 3.2 3B | Function Calling, Agentic AI (SFT, LoRA, DPO) | Notebook | SageMaker AI Training Jobs | Accelerate, Datasets, SageMaker, Transformers, TRL, Weights & Biases |
Qwen 2.5 0.5B Instruct | Reasoning (GRPO) | Notebook | SageMaker AI Training Jobs | Accelerate, Datasets, SageMaker, Transformers, TRL |
LLaMA 3 8B Instruct | Reasoning, Conversation (SFT, LoRA, QLoRA, KD) | Notebook | SageMaker AI Training Jobs | Accelerate, Datasets, SageMaker, Transformers, TRL, TorchRun, Weights & Biases |
LLaMA 3 / LLaMA 2 / Mistral | Text Generation (FSDP) | Notebook | SageMaker HyperPod (Slurm/EKS) | PyTorch, SMHP Training Operator |
GPT on NeMo | Text Generation (Spectrum) | Notebook | SageMaker HyperPod (Slurm/EKS) | NVIDIA NeMo |
SMoLM 1.7B on Picotron | Text Generation (FSDP) | Notebook | SageMaker HyperPod (Slurm/EKS) | Hugging Face Picotron |
LLaMA 3.1 on TorchTitan | Text Generation (FSDP, Spectrum) | Notebook | SageMaker HyperPod (Slurm/EKS) | PyTorch, TorchTitan |
Qwen 2.5 72B w/ HF TRL | Preference Alignment, Reasoning (GRPO) | Notebook | SageMaker HyperPod (Slurm/EKS) | PyTorch, Hugging Face TRL |
Qwen 2.5 VL | Multimodality (SFT, QLoRA) | Notebook | SageMaker Training Jobs | SWIFT |
Meta LLaMA 3 8B RLHF | Preference Alignment (FSDP, DPO, QLoRA) | Notebook | SageMaker Training Jobs | Hugging Face TRL |
GPT-OSS 20B | Reasoning (Accelerate, SFT, MXFP4, vLLM) | Notebook | SageMaker Training Jobs | Hugging Face Trainer, MXFP4 |
GPT-OSS 20B | Reasoning (SMDDP, SFT, MXFP4) | Notebook | SageMaker HyperPods (EKS) | HyperPod Recipes |
GPT-OSS 20B | Reasoning (SMDDP, SFT, MXFP4) | Notebook | SageMaker TrainingJobs | HyperPod Recipes |
LLaMA 3.1 8B Instruct | Reasoning (FSDP, SFT, QLoRA) | Notebook | SageMaker TrainingJobs | Transformers, TRL, BitsAndBytes, Accelerate, MLflow, PEFT |
Mistral 7B v0.3 Instruct | Reasoning (DDP, SFT, QLoRA) | Notebook | SageMaker TrainingJobs | Transformers, TRL, BitsAndBytes, Accelerate, MLflow, PEFT |
Mistral 7B v0.3 Instruct | Reasoning (FSDP, SFT, QLoRA) | Notebook | SageMaker TrainingJobs | Transformers, TRL, BitsAndBytes, Accelerate, MLflow, PEFT |
- SageMaker Hyperpod - High-performance computing clusters for large-scale training
- SageMaker Training Jobs - Standard managed training infrastructure
Complete production workflows covering the entire ML lifecycle with enterprise-grade practices
- Model Customization - Advanced fine-tuning techniques including instruction tuning, parameter-efficient methods (LoRA, QLoRA), and domain adaptation strategies
- Inference - Production deployment patterns, real-time and batch inference, auto-scaling configurations, and multi-model endpoints
- MLOps - Automated CI/CD pipelines using SageMaker Pipelines with integrated preprocessing, training, evaluation, model registration, and batch transform operations
Scalable training implementations for Large Language Models with advanced parallelization strategies
- SageMaker Unified Studio - Native distributed training capabilities with seamless cluster management and resource optimization
- FSDP (Fully Sharded Data Parallel) - Memory-efficient training using Hugging Face FSDP integration for models exceeding single-GPU memory limits
- Reinforcement Learning from Human Feedback - DPO (Direct Preference Optimization) and GRPO implementations using TRL, Unsloth, and veRL frameworks
- Efficient Fine-tuning - Unsloth-powered instruction fine-tuning with 2x-5x speed improvements and reduced memory consumption
Knowledge-enhanced AI systems with advanced embedding and retrieval techniques
- VoyageAI Embedding RAG - Production-ready RAG implementation featuring VoyageAI's state-of-the-art embeddings, Claude 3 integration, vector database optimization, and semantic search capabilities for enterprise knowledge bases
π€ AI Agents
Intelligent multi-agent frameworks and orchestration systems
- DeepSeek CrewAI Agent - Multi-agent research and writing system using DeepSeek R1 Distilled LLaMA 70B with CrewAI orchestration for collaborative task execution
- LangGraph Model Context Protocol - Advanced agentic workflows with MCP integration for loan underwriting, featuring multi-step orchestration and role-based agent specialization
- ML Models as Agent Tools - Integration patterns for using SageMaker-deployed ML models as agent tools via MCP, including both direct implementation and Amazon Bedrock AgentCore approaches
- SageMaker Strands Integration - Enterprise-grade agent solutions with managed hosting and authentication
π― Use Cases
Real-world applications and industry-specific solutions
- RAG & Chatbots - Conversational AI with knowledge retrieval using FLAN-T5-XL and Falcon-7B models, featuring document processing and context-aware responses
- Text Summarization - Document and content summarization using AI21, Falcon-7B, and FLAN-T5-XL models with LangChain integration
- Text Summarization to Image - Multi-modal content generation pipeline combining text summarization with image generation capabilities
- Text-to-SQL - Natural language database querying using Code Llama with LangChain SQL query generation, complete with demo database and web interface
Performance and efficiency improvements for production deployments
- Post-Training Quantization - Model compression techniques using GPTQ and AWQ quantization methods, reducing memory footprint by 50-75% while maintaining accuracy, with automated SageMaker Training Job implementation
Comprehensive benchmarking and performance analysis frameworks
- DeepSeek R1 Distilled - Performance evaluation and benchmarking tools for the DeepSeek R1 Distilled model series, including accuracy metrics, latency analysis, and cost optimization studies
π¦ Archive
Legacy examples and deprecated implementations for reference and migration guidance
- Complete ML Lifecycle: From data preprocessing and model training to production deployment and monitoring
- Multiple Training Strategies: Single-node, multi-node distributed, and reinforcement learning approaches with automatic scaling
- Production-Ready MLOps: Automated CI/CD pipelines with SageMaker Pipelines, model registry, and deployment automation
- Advanced AI Patterns: RAG systems, multi-agent orchestration, and multi-modal applications
- Performance Optimization: Model quantization, distributed training, inference acceleration, and cost optimization
- Industry Use Cases: Financial services, healthcare, retail, and manufacturing applications
- Enterprise Security: IAM integration, VPC support, encryption at rest and in transit
- Amazon SageMaker - Managed ML platform with training, inference, and MLOps capabilities
- Amazon Bedrock - Managed foundation model service with enterprise security
- AWS Lambda & API Gateway - Serverless inference and API management
- Hugging Face Transformers - State-of-the-art model implementations and fine-tuning utilities
- PyTorch & TensorFlow - Deep learning frameworks with distributed training support
- FSDP & DeepSpeed - Memory-efficient distributed training frameworks
- Ray - Distributed computing framework for ML workloads
- LangGraph & LangChain - Agent frameworks and workflow orchestration
- CrewAI - Multi-agent system coordination and task delegation
- Model Context Protocol (MCP) - Standardized tool integration for AI agents
- Unsloth - 2x-5x faster fine-tuning with reduced memory usage
- TRL (Transformer Reinforcement Learning) - RLHF and preference optimization
- llm-compressor - Post-training quantization with GPTQ and AWQ
- vLLM - High-throughput inference serving
- AWS Account with SageMaker access and appropriate service limits
- IAM roles with SageMaker, S3, and related service permissions
- VPC configuration for secure deployments (optional but recommended)
- Python 3.8+ with virtual environment management
- Jupyter Lab/Notebook for interactive development
- AWS CLI configured with appropriate credentials
- Git for version control and collaboration
- Intermediate understanding of machine learning concepts
- Familiarity with Python programming and data science libraries
- Basic knowledge of AWS services and cloud computing
- Understanding of transformer architectures and LLMs (recommended)
# Clone the repository
git clone <repository-url>
cd generative-ai-sagemaker
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install core dependencies
pip install -r requirements.txt
# Configure AWS credentials
aws configure
# Verify SageMaker access
aws sagemaker list-training-jobs --max-items 1
- Start with Getting Started Guide
- Explore basic Inference examples
- Try simple Use Cases like text summarization
- Dive into Model Customization
- Explore Distributed Training techniques
- Implement RAG systems for knowledge-enhanced applications
- Master MLOps pipelines
- Build Multi-agent systems
- Optimize with Quantization techniques
Run a simple inference example to validate your setup:
# Example: Deploy a pre-trained model for text generation
from sagemaker.huggingface import HuggingFaceModel
model = HuggingFaceModel(
transformers_version="4.28",
pytorch_version="2.0",
py_version="py310",
role=role,
model_data="s3://path-to-model"
)
predictor = model.deploy(
initial_instance_count=1,
instance_type="ml.g5.xlarge"
)
Data Preparation β Model Fine-tuning β Evaluation β Deployment β Monitoring
β β β β β
S3 Storage SageMaker Training Model Registry Endpoint CloudWatch
Document Ingestion β Embedding Generation β Vector Storage β Query Processing β Response Generation
β β β β β
Text Processing SageMaker Endpoint Vector DB Retrieval Logic LLM Inference
Task Definition β Agent Orchestration β Tool Execution β Result Aggregation β Final Output
β β β β β
LangGraph CrewAI Framework MCP Servers Agent Coordination Structured Response
We welcome contributions from the community! Please see our Contributing Guidelines for details on:
- Reporting Issues: Bug reports, feature requests, and documentation improvements
- Code Contributions: Pull requests, code reviews, and testing procedures
- Standards: Code formatting, documentation requirements, and best practices
- Community Guidelines: Code of conduct and collaboration expectations
- New use case implementations
- Performance optimizations
- Documentation improvements
- Testing and validation
- Integration with new AWS services
Security is our top priority. For security issue notifications and responsible disclosure, please see CONTRIBUTING.
- Use IAM roles with least privilege access
- Enable encryption at rest and in transit
- Implement VPC endpoints for secure communication
- Regular security audits and compliance checks
This library is licensed under the MIT-0 License. See the LICENSE file for details.
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Community Q&A and knowledge sharing
- Documentation: Comprehensive guides and API references
- SageMaker Documentation: Official AWS Documentation
- AWS Support: Professional support plans available
- AWS Training: Certification and learning paths
- Model Hub: Pre-trained models and configurations
- Best Practices: Performance optimization and cost management
- Case Studies: Real-world implementation examples
- Latest Updates: DeepSeek R1 integration, enhanced MCP support, improved quantization techniques
- Coming Soon: Multi-modal agents, advanced RAG patterns, cost optimization tools
- Community Highlights: Featured implementations and success stories
Ready to build the future of AI? Start exploring the examples and building your next Generative AI application on Amazon SageMaker! π
This repository is actively maintained and regularly updated with the latest AWS services, model architectures, and best practices. Star β the repository to stay updated with new releases and features.