GitHub - nithiyn/aws-sglang-hicache-dynamodb-s3backend

SGLang HiCache DynamoDB+S3+NIXL Backend (This repository is a work in progress)

An AWS-native storage backend for SGLang's HiCache system that enables multi-regional KV cache persistence using DynamoDB for metadata and S3 for data storage via NIXL.

Overview

This implementation decouples "cache locality" from "GPU locality," allowing KV cache state to survive instance scale-down events and be shared across regions while maintaining HiCache's performance benefits (up to 6× throughput improvement, 80% TTFT reduction).

Key Features

Multi-Region Support: Cache data can be accessed across AWS regions with automatic read-through
Two-Phase Commit: Ensures readers never access partially-written or corrupted cache blocks
GPU-Optimized Transfers: Uses NIXL for efficient GPU ↔ S3 data movement
Graceful Degradation: Continues serving inference requests even during AWS service outages
Cost Management: Configurable rate limits and hotness-based replication to control costs

Architecture

The backend separates concerns into two planes:

Metadata Plane (DynamoDB): Fast key lookups, state management, access tracking
Data Plane (S3 + NIXL): Bulk KV cache block storage and GPU-optimized transfers

System Architecture Diagram

graph TB
    subgraph App["SGLang Application Layer"]
        CC[Cache Controller]
    end
    
    subgraph Backend["HiCache DynamoDB+S3+NIXL Backend"]
        Main[HiCacheDynamoS3NixlBackend]
        MS[MetadataStore]
        S3S[S3NixlStore]
        SF[SingleFlightCoordinator]
        WSM[WriteStateManager]
        CRR[CrossRegionReplicator]
        MC[MetricsCollector]
    end
    
    subgraph Region1["AWS Region: us-east-1"]
        DDB1[(DynamoDB Global Table)]
        S3_1[(S3 Bucket)]
    end
    
    subgraph Region2["AWS Region: us-west-2"]
        DDB2[(DynamoDB Replica)]
        S3_2[(S3 Bucket)]
    end
    
    subgraph Region3["AWS Region: eu-west-1"]
        DDB3[(DynamoDB Replica)]
        S3_3[(S3 Bucket)]
    end
    
    %% Main connections
    CC --> Main
    Main --> MS
    Main --> S3S
    Main --> SF
    Main --> WSM
    Main --> CRR
    Main --> MC
    
    %% Storage connections
    MS --> DDB1
    MS -.-> DDB2
    MS -.-> DDB3
    
    S3S --> S3_1
    S3S -.-> S3_2
    S3S -.-> S3_3
    
    %% Replication
    DDB1 <-.-> DDB2
    DDB2 <-.-> DDB3
    DDB1 <-.-> DDB3
    
    S3_1 -.-> S3_2
    S3_2 -.-> S3_3
    
    %% Styling
    classDef aws fill:#ff9900,stroke:#232f3e,stroke-width:2px,color:#fff
    classDef backend fill:#4a90e2,stroke:#2c5aa0,stroke-width:2px,color:#fff
    classDef app fill:#50c878,stroke:#2d5016,stroke-width:2px,color:#fff
    
    class DDB1,DDB2,DDB3,S3_1,S3_2,S3_3 aws
    class Main,MS,S3S,SF,WSM,CRR,MC backend
    class CC app

Data Flow Diagrams

Cache Write Flow (Two-Phase Commit)

sequenceDiagram
    participant App as SGLang App
    participant Backend as HiCache Backend
    participant WSM as WriteStateManager
    participant DDB as DynamoDB
    participant S3 as S3 + NIXL
    
    App->>Backend: set(key, kv_data)
    Backend->>WSM: begin_write(key, region)
    WSM->>DDB: PutItem(state=WRITING, write_id=uuid)
    
    alt Write Claim Successful
        DDB-->>WSM: Success
        WSM-->>Backend: write_id
        Backend->>S3: NIXL write(kv_data)
        S3-->>Backend: {etag, version_id, size}
        Backend->>WSM: complete_write(write_id, s3_result)
        WSM->>DDB: UpdateItem(state=COMMITTED, condition: write_id match)
        DDB-->>WSM: Success
        WSM-->>Backend: Success
        Backend-->>App: True
    else Write Claim Failed (Key Exists)
        DDB-->>WSM: ConditionalCheckFailedException
        WSM-->>Backend: None
        Backend-->>App: False
    end

Cache Read Flow (Cross-Region Fallback)

sequenceDiagram
    participant App as SGLang App
    participant Backend as HiCache Backend
    participant SF as SingleFlight
    participant DDB as DynamoDB
    participant S3Local as S3 Local
    participant S3Remote as S3 Remote
    participant CRR as CrossRegionReplicator
    
    App->>Backend: get(key)
    Backend->>DDB: GetItem(key, local_region)
    
    alt Local Cache Hit
        DDB-->>Backend: {state: COMMITTED, s3_path}
        Backend->>S3Local: NIXL read(s3_path)
        S3Local-->>Backend: kv_data
        Backend->>DDB: update_access_async(hit_count++)
        Backend-->>App: kv_data
    else Local Cache Miss
        DDB-->>Backend: NOT_FOUND
        Backend->>DDB: Query(key, all_regions)
        DDB-->>Backend: [{region: us-west-2, state: COMMITTED}]
        Backend->>SF: do(key, fetch_from_remote)
        SF->>CRR: fetch_from_remote(key, remote_metadata)
        CRR->>S3Remote: NIXL read(remote_s3_path)
        S3Remote-->>CRR: kv_data
        
        alt Should Replicate Locally
            CRR->>S3Local: NIXL write(kv_data)
            CRR->>DDB: PutItem(local_region_metadata)
        end
        
        CRR-->>SF: kv_data
        SF-->>Backend: kv_data
        Backend-->>App: kv_data
    end

Installation

Prerequisites

Python 3.8+
AWS credentials configured (IAM role recommended)
NIXL library installed
SGLang framework

Dependencies

pip install boto3 aioboto3 nixl

AWS Infrastructure Setup

DynamoDB Global Table:

aws dynamodb create-table \
  --table-name sglang-hicache-metadata \
  --attribute-definitions \
    AttributeName=cache_key,AttributeType=S \
    AttributeName=region,AttributeType=S \
  --key-schema \
    AttributeName=cache_key,KeyType=HASH \
    AttributeName=region,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST

S3 Buckets (per region):

aws s3 mb s3://sglang-hicache-us-east-1 --region us-east-1
aws s3 mb s3://sglang-hicache-us-west-2 --region us-west-2

Enable S3 Versioning:

aws s3api put-bucket-versioning \
  --bucket sglang-hicache-us-east-1 \
  --versioning-configuration Status=Enabled

Configuration

Environment Variables

export HICACHE_DYNAMODB_TABLE=sglang-hicache-metadata
export HICACHE_S3_BUCKET_US_EAST_1=sglang-hicache-us-east-1
export HICACHE_S3_BUCKET_US_WEST_2=sglang-hicache-us-west-2
export AWS_DEFAULT_REGION=us-east-1

SGLang Integration

Enable the backend via command-line flag:

python -m sglang.launch_server \
  --model-path meta-llama/Llama-2-7b-chat-hf \
  --hicache-storage-backend dynamodb_s3_nixl \
  --other-sglang-options

Configuration Options

@dataclass
class BackendConfig:
    # Required
    dynamodb_table_name: str = "sglang-hicache-metadata"
    local_region: str = "us-east-1"
    local_s3_bucket: str = "sglang-hicache-us-east-1"
    
    # Multi-region setup
    regional_buckets: Dict[str, str] = field(default_factory=dict)
    
    # Performance tuning
    max_concurrent_remote_fetches: int = 10
    max_bytes_per_minute_replication: int = 100 * 1024 * 1024  # 100MB/min
    hotness_threshold_for_replication: int = 3
    
    # Lifecycle management
    cache_ttl_seconds: int = 86400  # 24 hours
    stale_write_ttl_seconds: int = 60

Usage

The backend implements SGLang's standard HiCache interface:

from sglang.srt.mem_cache.storage.dynamodb_s3_nixl import HiCacheDynamoS3NixlBackend

# Initialize backend
backend = HiCacheDynamoS3NixlBackend(config)

# Check if cache key exists
exists = await backend.exist("cache_key_hash")

# Retrieve cached data
data = await backend.get("cache_key_hash")

# Store new cache data
success = await backend.set("cache_key_hash", kv_cache_data)

IAM Permissions

Required DynamoDB Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:DeleteItem",
        "dynamodb:Query"
      ],
      "Resource": "arn:aws:dynamodb:*:*:table/sglang-hicache-metadata*"
    }
  ]
}

Required S3 Permissions

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::sglang-hicache-*/*"
    }
  ]
}

Monitoring

Key Metrics

The backend exposes the following metrics for monitoring:

hit_local_s3: Cache hits from local region S3
hit_remote_s3: Cache hits from remote region S3
miss_total: Total cache misses (with reason breakdown)
bytes_remote_read: Cross-region data transfer volume
bytes_written: Data written to cache
dynamodb_errors: DynamoDB operation failures
s3_errors: S3 operation failures
exist_latency_p50_p95: Latency percentiles for exist() calls
get_latency_p50_p95: Latency percentiles for get() calls
set_latency_p50_p95: Latency percentiles for set() calls

Performance Targets

exist() p95 latency: ≤10ms (single-region)
get() metadata lookup p95 latency: ≤10ms
set() non-blocking time: ≤50ms (before async transfer)

Cost Optimization

Recommended Settings

VPC Endpoints: Use VPC Gateway Endpoints for DynamoDB and S3 to reduce data transfer costs
S3 Lifecycle Rules: Configure automatic transition to cheaper storage tiers
Replication Limits: Set appropriate max_bytes_per_minute_replication based on budget
TTL Configuration: Use DynamoDB TTL and S3 lifecycle rules for automatic cleanup

Multi-Region Considerations

Cross-region data transfer incurs charges
Configure hotness_threshold_for_replication to only replicate frequently accessed cache
Monitor bytes_remote_read metric to track cross-region costs

Troubleshooting

Common Issues

High Miss Rate: Check DynamoDB and S3 service health, verify IAM permissions
Cross-Region Timeouts: Increase max_concurrent_remote_fetches or check network connectivity
Cost Spikes: Review replication settings and cross-region transfer volumes
Stale WRITING Entries: Background cleanup runs automatically, check stale_write_ttl_seconds

Debug Mode

Enable detailed logging:

import logging
logging.getLogger('sglang.hicache.dynamodb_s3_nixl').setLevel(logging.DEBUG)

Development

Running Tests

# Unit tests
python -m pytest sglang/srt/mem_cache/storage/dynamodb_s3_nixl/

# Integration tests (requires LocalStack)
docker run -d -p 4566:4566 localstack/localstack
python -m pytest sglang/srt/mem_cache/storage/dynamodb_s3_nixl/ --integration

Project Structure

sglang/srt/mem_cache/storage/dynamodb_s3_nixl/
├── __init__.py
├── backend.py              # Main HiCache backend implementation
├── config.py               # Configuration dataclasses
├── dynamodb_store.py       # DynamoDB metadata operations
├── s3_store.py            # S3+NIXL data operations
├── single_flight.py       # Concurrent request coalescing
├── write_state_manager.py # Two-phase commit lifecycle
├── metadata.py            # Data models and schemas
└── test_*.py              # Test files

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

This project is part of the SGLang framework. See the main SGLang repository for license information.

Support

For issues and questions:

SGLang GitHub Issues: sglang repository
Documentation: SGLang Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.kiro		.kiro
sglang		sglang
.gitignore		.gitignore
README.md		README.md

nithiyn/aws-sglang-hicache-dynamodb-s3backend

Folders and files

Latest commit

History

Repository files navigation