generated from amazon-archives/__template_MIT-0
-
Notifications
You must be signed in to change notification settings - Fork 31
feat: Add EKS capabilities integration #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
allamand
wants to merge
53
commits into
riv25
Choose a base branch
from
feat/eks-capabilities-squash
base: riv25
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
b3b62d0 to
a721a7a
Compare
- Add EKS capabilities for ArgoCD, Kro, and ACK controllers - Add Identity Center integration for SSO - Add multi-cluster ACK role management - Add JupyterHub addon with SSO integration - Add Helm chart dependencies for all charts - Update deployment scripts and utilities - Add comprehensive documentation and steering guides
- Remove complex pod and deployment checks since ArgoCD runs as EKS managed service - Simplify to only check API availability via kubectl get applications - Update status messages to reflect EKS capabilities context - Remove domain availability checks as they're not needed for managed ArgoCD
- Create gitops/addons/bootstrap/default/addons/platform-manifests/values.yaml - Set gpu.enabled to true as static configuration - Separates static config from dynamic templated values
- Add gpu-nodepool.yaml template for EKS Auto Mode GPU support - Remove chart default values.yaml (moved to GitOps addon structure)
- Add Helm chart for pre-pulling container images - Improves pod startup time by caching images on nodes
- Add Kro ResourceGraphDefinition for Ray Serve deployments - Enables declarative Ray cluster and service management
- Add GitOps workload definitions for Ray deployments - Supports Ray Serve ML model serving
- Replace static ray-serve.yaml with Kro-based deployment - Add comprehensive README for Ray Serve setup - Update catalog-info with detailed component metadata - Enhance template with improved parameter handling
- Register ray-serve template in main catalog-info.yaml
- Add Ray Serve deployment to homepage quick actions
- Add image-prepuller addon definition to bootstrap/default - Add platform-manifests addon definition with GPU support - Enable addons in control-plane environment - Move GPU config from valuesObject to values.yaml
- Minor updates to EKS RGD configuration
- Enable image_prepuller addon - Enable platform_manifests addon
- Enhance error handling and logging in utils.sh
- Add Backstage Redis session storage implementation guide - Add custom NodePool migration guide for EKS Auto Mode - Add Ray Service S3 model cache implementation plan These guides document key platform improvements for session persistence, node lifecycle management, and ML model caching.
- Add Keycloak split-brain detection ConfigMap for cluster health monitoring - Add custom PEEKS-managed NodePools with optimized consolidation settings - peeks-general-purpose: 10m consolidation (vs 30s default) - peeks-system: 30m consolidation for critical workloads - 48h termination grace period for stability These additions improve platform reliability and node lifecycle management.
Reduce GPU node consolidateAfter from 1h to 5m for faster resource cleanup while maintaining stability for GPU workloads. This improves cost efficiency without impacting running jobs.
Refactor RayService ResourceGraphDefinition to support both CPU and GPU workloads with conditional resource allocation: - Add rayserviceCpu for CPU-only models (includeWhen: gpu == 0) - Add rayserviceGpu for GPU-accelerated models (includeWhen: gpu > 0) - Improve resource specifications and autoscaling configuration - Add proper labels and annotations for Backstage integration - Update skeleton manifest with model configuration support This enables efficient resource allocation based on model requirements.
Add comprehensive model selection and resource configuration: - Add 5 pre-configured AI models with resource recommendations: - DialoGPT-medium (CPU, 1.4GB) - Phi-2 (CPU, 5.5GB) - TinyLlama (CPU, 2.2GB) - Mistral-7B (GPU, 14GB) - Llama-2-7B (GPU, 13GB) - Add model-specific resource defaults and validation - Add max generation length configuration - Update default serve config to gpu-demo-serve-config.zip - Include resource sizing guidance in template description This simplifies model deployment by providing tested configurations and clear resource requirements for each model type.
Add CPU and memory resource configuration for Flux2 controllers: - helmController: 100m CPU, 128Mi-256Mi memory - imageReflectionController: 100m CPU, 128Mi-256Mi memory This ensures proper resource allocation and prevents OOM issues while maintaining efficient resource usage.
- Move GPU configuration from addons.yaml to values.yaml - Add customNodepools.enabled flag for PEEKS-managed nodepools - Clean up redundant GPU configuration in addons.yaml This allows clusters to opt-in to custom nodepools with optimized consolidation settings instead of EKS Auto Mode defaults.
Add pre-configured Ray Serve packages: - cpu-serve-config.zip: CPU-optimized model serving - gpu-serve-config.zip: GPU-accelerated inference - gpu-demo-serve-config.zip: Demo configuration with GPU support - vllm-serve-config.zip: vLLM-based high-performance serving - vllm_serve.py: vLLM deployment implementation with async engine These packages provide ready-to-use configurations for different Ray Serve deployment scenarios with appropriate resource allocation.
Signed-off-by: Workshop User <[email protected]>
- Use validate command instead of state list for faster checks - Add 10s timeout to prevent hanging on locked states - Handle timeout exit codes properly - Skip lock check if validation times out
- Switch GPU variant to public.ecr.aws/data-on-eks/ray2.24.0-py310-vllm-gpu:v1 - Downgrade Ray version from 2.34.0 to 2.24.0 for vLLM compatibility - Add num-cpus: 0 to head node to prevent scheduling workloads on head - Maintain CPU variant with standard rayproject/ray:2.24.0 image This enables GPU inference with pre-built vLLM support and proper resource isolation between head and worker nodes.
- Split into Basic Configuration and Resource Configuration pages - Use const values for automatic resource allocation based on CPU/GPU - Remove detailed model recommendations table (simplified description) - CPU: 2 head CPU, 8Gi head memory, 4 worker CPU, 16Gi worker memory, 0 GPU - GPU: 2 head CPU, 16Gi head memory, 8 worker CPU, 48Gi worker memory, 1 GPU This matches the platform-on-eks-workshop template with proper dynamic resource configuration based on deployment type selection.
Remove verbose prerequisite instructions from template description. Keep it concise and focused on the template's purpose.
Document the production-ready approach for Ray GPU inference: - Custom pre-built Ray+vLLM images via CodeBuild - Automated image build pipeline (Terraform → Lambda → CodeBuild → ECR) - Solutions for HuggingFace token issues and runtime pip failures - Based on AWS GenAI on EKS workshop proven patterns This guide explains the architecture and implementation for reliable GPU-accelerated model serving with Ray and vLLM.
Add Helm chart for KubeRay operator deployment: - Chart.yaml with kuberay-operator v1.2.2 dependency - Minimal values.yaml for configuration - Templates for namespace creation This enables GitOps-managed Ray operator deployment across clusters.
Add automated Ray+vLLM custom image build infrastructure: - Dockerfile.ray-vllm: Ray 2.49.0 with vLLM 0.6.4.post1 and CUDA support - ray-image-build.tf: CodeBuild project for automated image builds - model-storage.tf: S3 bucket for model caching (optional) - trigger_codebuild.zip: Lambda function to trigger builds This enables production-ready GPU inference with pre-built images, eliminating runtime pip install failures and HuggingFace token issues.
Add default values for platform-manifests addon: - GPU nodepool enabled by default - Custom nodepools enabled for optimized consolidation This allows clusters to use PEEKS-managed nodepools with better lifecycle management instead of EKS Auto Mode aggressive defaults.
Relocate values.yaml from bootstrap/default/addons to default/addons for consistency with addon configuration structure.
Switch from upstream kuberay-operator Helm chart to local wrapper chart. This allows customization and integration with platform-specific configurations like model prestaging and service accounts.
Add ray-worker-sa service account to both CPU and GPU Ray worker pods. This enables Pod Identity for AWS service access (S3 model caching, ECR image pulls) without using node IAM roles.
- Modified cpu-serve-config.zip to load models from S3-mounted paths - Added local_files_only=True to prevent HuggingFace downloads - Added MODEL-MANAGEMENT.md with instructions for adding new models - CPU and GPU now use consistent model loading approach
Add S3-backed persistent storage for model caching: - Create PersistentVolume using S3 CSI driver - Create PersistentVolumeClaim for model access - Mount /mnt/models in both head and worker pods - Support configurable S3 bucket via s3ModelBucket parameter - Upgrade Ray version from 2.24.0 to 2.34.0 for CPU variant - Add volume mounts to both CPU and GPU variants This enables fast model loading from S3 without downloading from HuggingFace on every deployment.
Add Terraform module for S3 CSI driver deployment: - Install Mountpoint for Amazon S3 CSI driver - Configure IAM role for S3 access via Pod Identity - Enable ReadOnlyMany access mode for model sharing - Support for S3 bucket mounting in Ray workloads This provides the infrastructure for S3-backed model storage.
Add Dockerfile and build script for custom Ray GPU images: - Based on rayproject/ray:2.34.0-py310-gpu - Pre-installs vLLM 0.6.4.post1 with CUDA support - Includes transformers, accelerate, and bitsandbytes - Build script with ECR push automation This eliminates runtime pip installs and HuggingFace token issues.
Update template to support S3-backed model caching: - Add s3ModelBucket parameter with default 'peeks-ray-models' - Add awsRegion parameter for S3 CSI driver configuration - Update skeleton manifests with S3 bucket parameters - Simplify resource configuration with better defaults - Update catalog-info with proper metadata This enables users to specify custom S3 buckets for model storage.
Simplify prestaging job to focus on S3 upload: - Remove Pod Identity validation (handled by S3 CSI driver) - Streamline download and upload process - Reduce resource requirements - Improve error handling and retry logic With S3 CSI driver, models are mounted directly rather than downloaded at runtime, making this job optional for pre-warming.
- Update model-storage.tf S3 bucket configuration - Add ray-serve template to Backstage catalog - Improve Terraform utils.sh error handling
Switch from public Ray images to custom ECR images: - Use ray-gpu-optimized image from ECR for all variants - Add awsAccountId parameter to schema for ECR image path - Update default modelId to use S3-mounted models path - Apply custom image to both CPU and GPU head/worker pods This enables use of pre-built images with vLLM and eliminates HuggingFace authentication issues.
Update README.md to match the latest version from riv25 branch with: - Simplified single-region deployment (us-west-2) - Updated CloudFormation template URL - Use .yaml extension instead of .json - Use templateBucket variable for cleaner syntax
0090bab to
63f98ad
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.