This hands-on workshop teaches you how to build production-grade generative AI applications with a focus on cost optimization, performance enhancement, and operational excellence.
Target Audience: AI/ML Developers, Software Engineers working with agentic systems, DevOps Engineers deploying GenAI workloads
This workshop focuses on optimizing three key metrics for production GenAI applications:
| Objective | Definition | Why It Matters |
|---|---|---|
| Accuracy | The quality and correctness of model outputs relative to expected results | Ensures your application delivers value to users and meets business requirements |
| Cost | Total expenditure on model inference, including input tokens, output tokens, and cache operations | Controls operational expenses and enables sustainable scaling |
| Latency | Time elapsed from request initiation to response completion | Impacts user experience and application responsiveness |
This workshop is organized into progressive parts:
Estimated time: 1.5 hours
Build a solid understanding of tokens, pricing, and optimization strategies.
| Topic | Duration | Description |
|---|---|---|
| Prompts 101 | 30 min | Tokens, pricing, TPM/RPM, terminology |
| Optimization Strategy | 45 min | Model selection, prompt design, parameter tuning, basic caching |
| Langfuse Observability | 30 min | LLM tracing, cost tracking, prompt management with Langfuse |
Estimated time: 3 hours
Build a production customer support agent while applying progressive optimization techniques.
| Topic | Duration | Description |
|---|---|---|
| Baseline Agent | 20 min | Build unoptimized baseline agent, establish metrics |
| Quick Wins | 20 min | Concise prompts, max_tokens, stop_sequences |
| Prompt Caching | 30 min | System prompt and tool definition caching |
| LLM Routing | 30 min | Route queries to appropriate models by complexity |
| Guardrails | 30 min | Bedrock Guardrails for topic/content filtering |
| AgentCore Gateway | 45 min | Semantic tool search, centralized tool management |
| Evaluations | 30 min | Systematic evaluation across all agent versions |
Note: Part 2 requires infrastructure deployment. See 02-developer-journey/README.md for setup instructions.
Estimated time: 2-2.5 hours
Advanced prompt engineering techniques and complex caching patterns.
| Topic | Duration | Description |
|---|---|---|
| Advanced Prompt Engineering | 60 min | CoT, Self-Critical, CoD, technique selection etc. |
| Advanced Prompt Caching | 60 min | Multi-checkpoint patterns, cache strategies etc. |
| TBD | TBD | TBD |
- AWS Account with Amazon Bedrock access enabled
- Python 3.10 or higher
- Basic familiarity with Python and Jupyter notebooks
Install uv (if not already installed):
macOS/Linux:
# Option 1: Official installer (recommended)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Option 2: Using Homebrew
brew install uv
# Option 3: Using pipx
brew install pipx
pipx install uvWindows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"Create and activate virtual environment:
# Create virtual environment with Python 3.11
uv venv --python 3.11
# Activate virtual environment
# macOS/Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate
# Install dependencies
uv pip install -r requirements.txt# Create virtual environment
python3 -m venv .venv
# Activate virtual environment
# macOS/Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtThe workshop supports multiple credential methods:
If you already have AWS CLI configured, the notebooks will automatically use your credentials:
# No additional setup needed - boto3 uses ~/.aws/credentialsIf you prefer to use a .env file:
# Copy the example file
cp .env.example .env
# Edit .env and uncomment the AWS credentials
# Then add your actual credentials:
# AWS_ACCESS_KEY_ID=your-actual-access-key-id
# AWS_SECRET_ACCESS_KEY=your-actual-secret-access-key
# AWS_DEFAULT_REGION=us-east-1The notebooks will automatically load credentials using python-dotenv.
python -c "
import boto3
client = boto3.client('bedrock-runtime', region_name='us-east-1')
print('Bedrock connection successful!')
print(f'Region: {client.meta.region_name}')
"This project is licensed under the MIT License - see the LICENSE file for details.