Generative AI using Amazon SageMaker

A comprehensive repository showcasing production-ready Generative AI workflows on Amazon SageMaker. This collection provides end-to-end implementations spanning the complete ML lifecycle, from foundational concepts to enterprise-scale deployments, covering model training, fine-tuning, inference optimization, MLOps automation, distributed training, RAG systems, intelligent agents, and real-world industry applications.

🚀 Quick Start

New to Generative AI on SageMaker? Start here:

Getting Started Guide - Essential setup, foundational concepts, and first steps

🤖 Models

This repository supports a comprehensive range of foundation models with various training methodologies. The table below shows model compatibility with different fine-tuning techniques, training frameworks, and deployment options.

Model Support Matrix

Models - Size	Use Case / Strategy	Notebook	Service	Frameworks & Libs
Qwen 3 0.6B	Function Calling, Agentic AI (FSDP, SFT, QLoRA)	Notebook	SageMaker AI Training Jobs	Transformers, Accelerate, SageMaker Model Trainer, MLflow, Weights & Biases
Qwen 3 0.6B	Function Calling, Agentic AI (LoRA, DPO)	Notebook	SageMaker AI Training Jobs	Transformers, Accelerate, SageMaker Model Trainer, MLflow, Weights & Biases
Arcee-Lite	Reasoning (FSDP, QLoRA, GRPO)	Notebook	SageMaker AI Training Jobs	Transformers, Accelerate, SageMaker Model Trainer, MLflow, Weights & Biases
Qwen 3 0.6B	Reasoning (FSDP, SFT, LoRA)	Notebook	SageMaker AI Training Jobs	Ray, Grafana, Prometheus, Transformers, SageMaker Model Trainer
Qwen 3 0.6B	Reasoning (FSDP, SFT, LoRA)	Notebook	SageMaker AI Training Jobs	Heterogeneous Cluster, Ray, Grafana, Prometheus, SageMaker Estimator
DeepSeek-R1-Distill-Llama-8B	Reasoning (SFT, QLoRA)	Notebook	SageMaker AI Training Jobs	Transformers, Accelerate, SageMaker Model Trainer, MLflow
GTE-Base-En-V1.5	Embeddings	Notebook	SageMaker AI Training Jobs	Sentence Transformers, Accelerate, SageMaker Estimator
Qwen 2 0.5B Instruct	Summarization (GRPO)	Notebook	SageMaker AI Training Jobs	Accelerate, Datasets, SageMaker, Transformers, TRL, Weights & Biases
Gemma 3 4B-It	Conversations, Reasoning (LoRA)	Notebook	SageMaker AI Training Jobs	Torch, TorchVision, TorchAudio, Unsloth, Psutil
Qwen 2 7B	Reasoning (GRPO)	Notebook	SageMaker AI Training Jobs	Verl, Torch, vLLM, FlashAttention
Qwen 3 8B	Conversations (Spectrum)	Notebook	SageMaker AI Training Jobs	Transformers, Accelerate, SageMaker Model Trainer, MLflow, Weights & Biases
Meta LLaMA 3.2 3B	Function Calling, Agentic AI (SFT, LoRA, DPO)	Notebook	SageMaker AI Training Jobs	Accelerate, Datasets, SageMaker, Transformers, TRL, Weights & Biases
Qwen 2.5 0.5B Instruct	Reasoning (GRPO)	Notebook	SageMaker AI Training Jobs	Accelerate, Datasets, SageMaker, Transformers, TRL
LLaMA 3 8B Instruct	Reasoning, Conversation (SFT, LoRA, QLoRA, KD)	Notebook	SageMaker AI Training Jobs	Accelerate, Datasets, SageMaker, Transformers, TRL, TorchRun, Weights & Biases
LLaMA 3 / LLaMA 2 / Mistral	Text Generation (FSDP)	Notebook	SageMaker HyperPod (Slurm/EKS)	PyTorch, SMHP Training Operator
GPT on NeMo	Text Generation (Spectrum)	Notebook	SageMaker HyperPod (Slurm/EKS)	NVIDIA NeMo
SMoLM 1.7B on Picotron	Text Generation (FSDP)	Notebook	SageMaker HyperPod (Slurm/EKS)	Hugging Face Picotron
LLaMA 3.1 on TorchTitan	Text Generation (FSDP, Spectrum)	Notebook	SageMaker HyperPod (Slurm/EKS)	PyTorch, TorchTitan
Qwen 2.5 72B w/ HF TRL	Preference Alignment, Reasoning (GRPO)	Notebook	SageMaker HyperPod (Slurm/EKS)	PyTorch, Hugging Face TRL
Qwen 2.5 VL	Multimodality (SFT, QLoRA)	Notebook	SageMaker Training Jobs	SWIFT
Meta LLaMA 3 8B RLHF	Preference Alignment (FSDP, DPO, QLoRA)	Notebook	SageMaker Training Jobs	Hugging Face TRL
GPT-OSS 20B	Reasoning (Accelerate, SFT, MXFP4, vLLM)	Notebook	SageMaker Training Jobs	Hugging Face Trainer, MXFP4
GPT-OSS 20B	Reasoning (SMDDP, SFT, MXFP4)	Notebook	SageMaker HyperPods (EKS)	HyperPod Recipes
GPT-OSS 20B	Reasoning (SMDDP, SFT, MXFP4)	Notebook	SageMaker TrainingJobs	HyperPod Recipes
LLaMA 3.1 8B Instruct	Reasoning (FSDP, SFT, QLoRA)	Notebook	SageMaker TrainingJobs	Transformers, TRL, BitsAndBytes, Accelerate, MLflow, PEFT
Mistral 7B v0.3 Instruct	Reasoning (DDP, SFT, QLoRA)	Notebook	SageMaker TrainingJobs	Transformers, TRL, BitsAndBytes, Accelerate, MLflow, PEFT
Mistral 7B v0.3 Instruct	Reasoning (FSDP, SFT, QLoRA)	Notebook	SageMaker TrainingJobs	Transformers, TRL, BitsAndBytes, Accelerate, MLflow, PEFT

Training Infrastructure

SageMaker Hyperpod - High-performance computing clusters for large-scale training
SageMaker Training Jobs - Standard managed training infrastructure

📚 Repository Structure

🎯 End-to-End GenAI Lifecycle

Complete production workflows covering the entire ML lifecycle with enterprise-grade practices

Model Customization - Advanced fine-tuning techniques including instruction tuning, parameter-efficient methods (LoRA, QLoRA), and domain adaptation strategies
Inference - Production deployment patterns, real-time and batch inference, auto-scaling configurations, and multi-model endpoints
MLOps - Automated CI/CD pipelines using SageMaker Pipelines with integrated preprocessing, training, evaluation, model registration, and batch transform operations

⚡ Distributed Training

Scalable training implementations for Large Language Models with advanced parallelization strategies

SageMaker Unified Studio - Native distributed training capabilities with seamless cluster management and resource optimization
FSDP (Fully Sharded Data Parallel) - Memory-efficient training using Hugging Face FSDP integration for models exceeding single-GPU memory limits
Reinforcement Learning from Human Feedback - DPO (Direct Preference Optimization) and GRPO implementations using TRL, Unsloth, and veRL frameworks
Efficient Fine-tuning - Unsloth-powered instruction fine-tuning with 2x-5x speed improvements and reduced memory consumption

🔍 Retrieval-Augmented Generation (RAG)

Knowledge-enhanced AI systems with advanced embedding and retrieval techniques

VoyageAI Embedding RAG - Production-ready RAG implementation featuring VoyageAI's state-of-the-art embeddings, Claude 3 integration, vector database optimization, and semantic search capabilities for enterprise knowledge bases

🤖 AI Agents

Intelligent multi-agent frameworks and orchestration systems

DeepSeek CrewAI Agent - Multi-agent research and writing system using DeepSeek R1 Distilled LLaMA 70B with CrewAI orchestration for collaborative task execution
LangGraph Model Context Protocol - Advanced agentic workflows with MCP integration for loan underwriting, featuring multi-step orchestration and role-based agent specialization
ML Models as Agent Tools - Integration patterns for using SageMaker-deployed ML models as agent tools via MCP, including both direct implementation and Amazon Bedrock AgentCore approaches
SageMaker Strands Integration - Enterprise-grade agent solutions with managed hosting and authentication

🎯 Use Cases

Real-world applications and industry-specific solutions

RAG & Chatbots - Conversational AI with knowledge retrieval using FLAN-T5-XL and Falcon-7B models, featuring document processing and context-aware responses
Text Summarization - Document and content summarization using AI21, Falcon-7B, and FLAN-T5-XL models with LangChain integration
Text Summarization to Image - Multi-modal content generation pipeline combining text summarization with image generation capabilities
Text-to-SQL - Natural language database querying using Code Llama with LangChain SQL query generation, complete with demo database and web interface

🚀 Inference Optimization

Performance and efficiency improvements for production deployments

Post-Training Quantization - Model compression techniques using GPTQ and AWQ quantization methods, reducing memory footprint by 50-75% while maintaining accuracy, with automated SageMaker Training Job implementation

📊 LLM Performance Evaluation

Comprehensive benchmarking and performance analysis frameworks

DeepSeek R1 Distilled - Performance evaluation and benchmarking tools for the DeepSeek R1 Distilled model series, including accuracy metrics, latency analysis, and cost optimization studies

📦 Archive

Legacy examples and deprecated implementations for reference and migration guidance

🛠️ Key Features

Complete ML Lifecycle: From data preprocessing and model training to production deployment and monitoring
Multiple Training Strategies: Single-node, multi-node distributed, and reinforcement learning approaches with automatic scaling
Production-Ready MLOps: Automated CI/CD pipelines with SageMaker Pipelines, model registry, and deployment automation
Advanced AI Patterns: RAG systems, multi-agent orchestration, and multi-modal applications
Performance Optimization: Model quantization, distributed training, inference acceleration, and cost optimization
Industry Use Cases: Financial services, healthcare, retail, and manufacturing applications
Enterprise Security: IAM integration, VPC support, encryption at rest and in transit

🏗️ Technologies & Frameworks

Core Platform

Amazon SageMaker - Managed ML platform with training, inference, and MLOps capabilities
Amazon Bedrock - Managed foundation model service with enterprise security
AWS Lambda & API Gateway - Serverless inference and API management

ML Frameworks

Hugging Face Transformers - State-of-the-art model implementations and fine-tuning utilities
PyTorch & TensorFlow - Deep learning frameworks with distributed training support
FSDP & DeepSpeed - Memory-efficient distributed training frameworks
Ray - Distributed computing framework for ML workloads

Agent & Orchestration

LangGraph & LangChain - Agent frameworks and workflow orchestration
CrewAI - Multi-agent system coordination and task delegation
Model Context Protocol (MCP) - Standardized tool integration for AI agents

Optimization & Efficiency

Unsloth - 2x-5x faster fine-tuning with reduced memory usage
TRL (Transformer Reinforcement Learning) - RLHF and preference optimization
llm-compressor - Post-training quantization with GPTQ and AWQ
vLLM - High-throughput inference serving

📋 Prerequisites

AWS Requirements

AWS Account with SageMaker access and appropriate service limits
IAM roles with SageMaker, S3, and related service permissions
VPC configuration for secure deployments (optional but recommended)

Development Environment

Python 3.8+ with virtual environment management
Jupyter Lab/Notebook for interactive development
AWS CLI configured with appropriate credentials
Git for version control and collaboration

Knowledge Prerequisites

Intermediate understanding of machine learning concepts
Familiarity with Python programming and data science libraries
Basic knowledge of AWS services and cloud computing
Understanding of transformer architectures and LLMs (recommended)

🚀 Getting Started

1. Environment Setup

# Clone the repository
git clone <repository-url>
cd generative-ai-sagemaker

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install core dependencies
pip install -r requirements.txt

2. AWS Configuration

# Configure AWS credentials
aws configure

# Verify SageMaker access
aws sagemaker list-training-jobs --max-items 1

3. Choose Your Learning Path

Beginners (New to GenAI/SageMaker)

Start with Getting Started Guide
Explore basic Inference examples
Try simple Use Cases like text summarization

Intermediate (Some ML/Cloud experience)

Dive into Model Customization
Explore Distributed Training techniques
Implement RAG systems for knowledge-enhanced applications

Advanced (Production-ready implementations)

4. Quick Validation

Run a simple inference example to validate your setup:

# Example: Deploy a pre-trained model for text generation
from sagemaker.huggingface import HuggingFaceModel

model = HuggingFaceModel(
    transformers_version="4.28",
    pytorch_version="2.0",
    py_version="py310",
    role=role,
    model_data="s3://path-to-model"
)

predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.xlarge"
)

🎯 Example Workflows

Text Generation Pipeline

Data Preparation → Model Fine-tuning → Evaluation → Deployment → Monitoring
     ↓                    ↓              ↓           ↓           ↓
  S3 Storage      SageMaker Training   Model Registry  Endpoint   CloudWatch

RAG Implementation

Document Ingestion → Embedding Generation → Vector Storage → Query Processing → Response Generation
        ↓                     ↓                 ↓              ↓                    ↓
   Text Processing      SageMaker Endpoint   Vector DB    Retrieval Logic    LLM Inference

Multi-Agent System

Task Definition → Agent Orchestration → Tool Execution → Result Aggregation → Final Output
      ↓                  ↓                   ↓               ↓                  ↓
  LangGraph         CrewAI Framework    MCP Servers    Agent Coordination   Structured Response

🤝 Contributing

We welcome contributions from the community! Please see our Contributing Guidelines for details on:

Reporting Issues: Bug reports, feature requests, and documentation improvements
Code Contributions: Pull requests, code reviews, and testing procedures
Standards: Code formatting, documentation requirements, and best practices
Community Guidelines: Code of conduct and collaboration expectations

Contribution Areas

New use case implementations
Performance optimizations
Documentation improvements
Testing and validation
Integration with new AWS services

🔒 Security

Security is our top priority. For security issue notifications and responsible disclosure, please see CONTRIBUTING.

Security Best Practices

Use IAM roles with least privilege access
Enable encryption at rest and in transit
Implement VPC endpoints for secure communication
Regular security audits and compliance checks

📄 License

This library is licensed under the MIT-0 License. See the LICENSE file for details.

🆘 Support & Resources

Community Support

GitHub Issues: Bug reports and feature requests
GitHub Discussions: Community Q&A and knowledge sharing
Documentation: Comprehensive guides and API references

AWS Resources

SageMaker Documentation: Official AWS Documentation
AWS Support: Professional support plans available
AWS Training: Certification and learning paths

Additional Resources

Model Hub: Pre-trained models and configurations
Best Practices: Performance optimization and cost management
Case Studies: Real-world implementation examples

🌟 What's New

Latest Updates: DeepSeek R1 integration, enhanced MCP support, improved quantization techniques
Coming Soon: Multi-modal agents, advanced RAG patterns, cost optimization tools
Community Highlights: Featured implementations and success stories

Ready to build the future of AI? Start exploring the examples and building your next Generative AI application on Amazon SageMaker! 🚀

This repository is actively maintained and regularly updated with the latest AWS services, model architectures, and best practices. Star ⭐ the repository to stay updated with new releases and features.

Name		Name	Last commit message	Last commit date
Latest commit History 332 Commits
1._getting_started		1._getting_started
2_end_to_end_genai_on_sagemaker		2_end_to_end_genai_on_sagemaker
3_distributed_training		3_distributed_training
4_rag/voyageai-embedding-RAG		4_rag/voyageai-embedding-RAG
5_agents		5_agents
6_use_cases/usecases		6_use_cases/usecases
7_inference		7_inference
llm-performance-evaluation/deepseek-r1-distilled		llm-performance-evaluation/deepseek-r1-distilled
x_archive		x_archive
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

aws-samples/amazon-sagemaker-generativeai

Folders and files

Latest commit

History

Repository files navigation

Generative AI using Amazon SageMaker

🚀 Quick Start

🤖 Models

Model Support Matrix

Training Infrastructure

📚 Repository Structure

🎯 End-to-End GenAI Lifecycle

⚡ Distributed Training

🔍 Retrieval-Augmented Generation (RAG)

🤖 AI Agents

🎯 Use Cases

🚀 Inference Optimization

📊 LLM Performance Evaluation

📦 Archive

🛠️ Key Features

🏗️ Technologies & Frameworks

Core Platform

ML Frameworks

Agent & Orchestration

Optimization & Efficiency

📋 Prerequisites

AWS Requirements

Development Environment

Knowledge Prerequisites

🚀 Getting Started

1. Environment Setup

2. AWS Configuration

3. Choose Your Learning Path

Beginners (New to GenAI/SageMaker)

Intermediate (Some ML/Cloud experience)

Advanced (Production-ready implementations)

4. Quick Validation

🎯 Example Workflows

Text Generation Pipeline

RAG Implementation

Multi-Agent System

🤝 Contributing

Contribution Areas

🔒 Security

Security Best Practices

📄 License

🆘 Support & Resources

Community Support

AWS Resources

Additional Resources

🌟 What's New

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 25

Uh oh!

Languages

Packages