Skip to content

nithiyn/aws-sglang-hicache-dynamodb-s3backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SGLang HiCache DynamoDB+S3+NIXL Backend (This repository is a work in progress)

An AWS-native storage backend for SGLang's HiCache system that enables multi-regional KV cache persistence using DynamoDB for metadata and S3 for data storage via NIXL.

Overview

This implementation decouples "cache locality" from "GPU locality," allowing KV cache state to survive instance scale-down events and be shared across regions while maintaining HiCache's performance benefits (up to 6× throughput improvement, 80% TTFT reduction).

Key Features

  • Multi-Region Support: Cache data can be accessed across AWS regions with automatic read-through
  • Two-Phase Commit: Ensures readers never access partially-written or corrupted cache blocks
  • GPU-Optimized Transfers: Uses NIXL for efficient GPU ↔ S3 data movement
  • Graceful Degradation: Continues serving inference requests even during AWS service outages
  • Cost Management: Configurable rate limits and hotness-based replication to control costs

Architecture

The backend separates concerns into two planes:

  • Metadata Plane (DynamoDB): Fast key lookups, state management, access tracking
  • Data Plane (S3 + NIXL): Bulk KV cache block storage and GPU-optimized transfers

System Architecture Diagram

graph TB
    subgraph App["SGLang Application Layer"]
        CC[Cache Controller]
    end
    
    subgraph Backend["HiCache DynamoDB+S3+NIXL Backend"]
        Main[HiCacheDynamoS3NixlBackend]
        MS[MetadataStore]
        S3S[S3NixlStore]
        SF[SingleFlightCoordinator]
        WSM[WriteStateManager]
        CRR[CrossRegionReplicator]
        MC[MetricsCollector]
    end
    
    subgraph Region1["AWS Region: us-east-1"]
        DDB1[(DynamoDB Global Table)]
        S3_1[(S3 Bucket)]
    end
    
    subgraph Region2["AWS Region: us-west-2"]
        DDB2[(DynamoDB Replica)]
        S3_2[(S3 Bucket)]
    end
    
    subgraph Region3["AWS Region: eu-west-1"]
        DDB3[(DynamoDB Replica)]
        S3_3[(S3 Bucket)]
    end
    
    %% Main connections
    CC --> Main
    Main --> MS
    Main --> S3S
    Main --> SF
    Main --> WSM
    Main --> CRR
    Main --> MC
    
    %% Storage connections
    MS --> DDB1
    MS -.-> DDB2
    MS -.-> DDB3
    
    S3S --> S3_1
    S3S -.-> S3_2
    S3S -.-> S3_3
    
    %% Replication
    DDB1 <-.-> DDB2
    DDB2 <-.-> DDB3
    DDB1 <-.-> DDB3
    
    S3_1 -.-> S3_2
    S3_2 -.-> S3_3
    
    %% Styling
    classDef aws fill:#ff9900,stroke:#232f3e,stroke-width:2px,color:#fff
    classDef backend fill:#4a90e2,stroke:#2c5aa0,stroke-width:2px,color:#fff
    classDef app fill:#50c878,stroke:#2d5016,stroke-width:2px,color:#fff
    
    class DDB1,DDB2,DDB3,S3_1,S3_2,S3_3 aws
    class Main,MS,S3S,SF,WSM,CRR,MC backend
    class CC app
Loading

Data Flow Diagrams

Cache Write Flow (Two-Phase Commit)

sequenceDiagram
    participant App as SGLang App
    participant Backend as HiCache Backend
    participant WSM as WriteStateManager
    participant DDB as DynamoDB
    participant S3 as S3 + NIXL
    
    App->>Backend: set(key, kv_data)
    Backend->>WSM: begin_write(key, region)
    WSM->>DDB: PutItem(state=WRITING, write_id=uuid)
    
    alt Write Claim Successful
        DDB-->>WSM: Success
        WSM-->>Backend: write_id
        Backend->>S3: NIXL write(kv_data)
        S3-->>Backend: {etag, version_id, size}
        Backend->>WSM: complete_write(write_id, s3_result)
        WSM->>DDB: UpdateItem(state=COMMITTED, condition: write_id match)
        DDB-->>WSM: Success
        WSM-->>Backend: Success
        Backend-->>App: True
    else Write Claim Failed (Key Exists)
        DDB-->>WSM: ConditionalCheckFailedException
        WSM-->>Backend: None
        Backend-->>App: False
    end
Loading

Cache Read Flow (Cross-Region Fallback)

sequenceDiagram
    participant App as SGLang App
    participant Backend as HiCache Backend
    participant SF as SingleFlight
    participant DDB as DynamoDB
    participant S3Local as S3 Local
    participant S3Remote as S3 Remote
    participant CRR as CrossRegionReplicator
    
    App->>Backend: get(key)
    Backend->>DDB: GetItem(key, local_region)
    
    alt Local Cache Hit
        DDB-->>Backend: {state: COMMITTED, s3_path}
        Backend->>S3Local: NIXL read(s3_path)
        S3Local-->>Backend: kv_data
        Backend->>DDB: update_access_async(hit_count++)
        Backend-->>App: kv_data
    else Local Cache Miss
        DDB-->>Backend: NOT_FOUND
        Backend->>DDB: Query(key, all_regions)
        DDB-->>Backend: [{region: us-west-2, state: COMMITTED}]
        Backend->>SF: do(key, fetch_from_remote)
        SF->>CRR: fetch_from_remote(key, remote_metadata)
        CRR->>S3Remote: NIXL read(remote_s3_path)
        S3Remote-->>CRR: kv_data
        
        alt Should Replicate Locally
            CRR->>S3Local: NIXL write(kv_data)
            CRR->>DDB: PutItem(local_region_metadata)
        end
        
        CRR-->>SF: kv_data
        SF-->>Backend: kv_data
        Backend-->>App: kv_data
    end
Loading

Installation

Prerequisites

  • Python 3.8+
  • AWS credentials configured (IAM role recommended)
  • NIXL library installed
  • SGLang framework

Dependencies

pip install boto3 aioboto3 nixl

AWS Infrastructure Setup

  1. DynamoDB Global Table:

    aws dynamodb create-table \
      --table-name sglang-hicache-metadata \
      --attribute-definitions \
        AttributeName=cache_key,AttributeType=S \
        AttributeName=region,AttributeType=S \
      --key-schema \
        AttributeName=cache_key,KeyType=HASH \
        AttributeName=region,KeyType=RANGE \
      --billing-mode PAY_PER_REQUEST
  2. S3 Buckets (per region):

    aws s3 mb s3://sglang-hicache-us-east-1 --region us-east-1
    aws s3 mb s3://sglang-hicache-us-west-2 --region us-west-2
  3. Enable S3 Versioning:

    aws s3api put-bucket-versioning \
      --bucket sglang-hicache-us-east-1 \
      --versioning-configuration Status=Enabled

Configuration

Environment Variables

export HICACHE_DYNAMODB_TABLE=sglang-hicache-metadata
export HICACHE_S3_BUCKET_US_EAST_1=sglang-hicache-us-east-1
export HICACHE_S3_BUCKET_US_WEST_2=sglang-hicache-us-west-2
export AWS_DEFAULT_REGION=us-east-1

SGLang Integration

Enable the backend via command-line flag:

python -m sglang.launch_server \
  --model-path meta-llama/Llama-2-7b-chat-hf \
  --hicache-storage-backend dynamodb_s3_nixl \
  --other-sglang-options

Configuration Options

@dataclass
class BackendConfig:
    # Required
    dynamodb_table_name: str = "sglang-hicache-metadata"
    local_region: str = "us-east-1"
    local_s3_bucket: str = "sglang-hicache-us-east-1"
    
    # Multi-region setup
    regional_buckets: Dict[str, str] = field(default_factory=dict)
    
    # Performance tuning
    max_concurrent_remote_fetches: int = 10
    max_bytes_per_minute_replication: int = 100 * 1024 * 1024  # 100MB/min
    hotness_threshold_for_replication: int = 3
    
    # Lifecycle management
    cache_ttl_seconds: int = 86400  # 24 hours
    stale_write_ttl_seconds: int = 60

Usage

The backend implements SGLang's standard HiCache interface:

from sglang.srt.mem_cache.storage.dynamodb_s3_nixl import HiCacheDynamoS3NixlBackend

# Initialize backend
backend = HiCacheDynamoS3NixlBackend(config)

# Check if cache key exists
exists = await backend.exist("cache_key_hash")

# Retrieve cached data
data = await backend.get("cache_key_hash")

# Store new cache data
success = await backend.set("cache_key_hash", kv_cache_data)

IAM Permissions

Required DynamoDB Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:DeleteItem",
        "dynamodb:Query"
      ],
      "Resource": "arn:aws:dynamodb:*:*:table/sglang-hicache-metadata*"
    }
  ]
}

Required S3 Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::sglang-hicache-*/*"
    }
  ]
}

Monitoring

Key Metrics

The backend exposes the following metrics for monitoring:

  • hit_local_s3: Cache hits from local region S3
  • hit_remote_s3: Cache hits from remote region S3
  • miss_total: Total cache misses (with reason breakdown)
  • bytes_remote_read: Cross-region data transfer volume
  • bytes_written: Data written to cache
  • dynamodb_errors: DynamoDB operation failures
  • s3_errors: S3 operation failures
  • exist_latency_p50_p95: Latency percentiles for exist() calls
  • get_latency_p50_p95: Latency percentiles for get() calls
  • set_latency_p50_p95: Latency percentiles for set() calls

Performance Targets

  • exist() p95 latency: ≤10ms (single-region)
  • get() metadata lookup p95 latency: ≤10ms
  • set() non-blocking time: ≤50ms (before async transfer)

Cost Optimization

Recommended Settings

  1. VPC Endpoints: Use VPC Gateway Endpoints for DynamoDB and S3 to reduce data transfer costs
  2. S3 Lifecycle Rules: Configure automatic transition to cheaper storage tiers
  3. Replication Limits: Set appropriate max_bytes_per_minute_replication based on budget
  4. TTL Configuration: Use DynamoDB TTL and S3 lifecycle rules for automatic cleanup

Multi-Region Considerations

  • Cross-region data transfer incurs charges
  • Configure hotness_threshold_for_replication to only replicate frequently accessed cache
  • Monitor bytes_remote_read metric to track cross-region costs

Troubleshooting

Common Issues

  1. High Miss Rate: Check DynamoDB and S3 service health, verify IAM permissions
  2. Cross-Region Timeouts: Increase max_concurrent_remote_fetches or check network connectivity
  3. Cost Spikes: Review replication settings and cross-region transfer volumes
  4. Stale WRITING Entries: Background cleanup runs automatically, check stale_write_ttl_seconds

Debug Mode

Enable detailed logging:

import logging
logging.getLogger('sglang.hicache.dynamodb_s3_nixl').setLevel(logging.DEBUG)

Development

Running Tests

# Unit tests
python -m pytest sglang/srt/mem_cache/storage/dynamodb_s3_nixl/

# Integration tests (requires LocalStack)
docker run -d -p 4566:4566 localstack/localstack
python -m pytest sglang/srt/mem_cache/storage/dynamodb_s3_nixl/ --integration

Project Structure

sglang/srt/mem_cache/storage/dynamodb_s3_nixl/
├── __init__.py
├── backend.py              # Main HiCache backend implementation
├── config.py               # Configuration dataclasses
├── dynamodb_store.py       # DynamoDB metadata operations
├── s3_store.py            # S3+NIXL data operations
├── single_flight.py       # Concurrent request coalescing
├── write_state_manager.py # Two-phase commit lifecycle
├── metadata.py            # Data models and schemas
└── test_*.py              # Test files

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

This project is part of the SGLang framework. See the main SGLang repository for license information.

Support

For issues and questions:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages