Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions example_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,9 @@ storages:
aws_access_key_id: some_key
aws_secret_access_key: some_secret
aws_bucket: some_bucket
aws_region: us-east-1 # AWS region where the bucket is located
# endpoint_url: null # Optional: custom endpoint for S3-compatible services
# s3_path: vcons # Optional: prefix for S3 keys
milvus:
module: storage.milvus
options:
Expand Down
89 changes: 54 additions & 35 deletions server/storage/s3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,31 +8,51 @@ S3 storage provides scalable, durable object storage capabilities, making it ide

## Configuration

Required configuration options:
Configuration options:

```yaml
storages:
s3:
module: storage.s3
options:
bucket: your-bucket-name # S3 bucket name
region: us-west-2 # AWS region
access_key: your-access-key # AWS access key
secret_key: your-secret-key # AWS secret key
prefix: vcons/ # Optional: key prefix
endpoint_url: null # Optional: custom endpoint
# Required options
aws_access_key_id: your-access-key # AWS access key ID
aws_secret_access_key: your-secret-key # AWS secret access key
aws_bucket: your-bucket-name # S3 bucket name

# Optional options
aws_region: us-east-1 # AWS region (recommended to avoid cross-region errors)
endpoint_url: null # Custom endpoint for S3-compatible services (e.g., MinIO)
s3_path: vcons # Prefix for S3 keys (optional)
```

### Configuration Options

| Option | Required | Description |
|--------|----------|-------------|
| `aws_access_key_id` | Yes | AWS access key ID for authentication |
| `aws_secret_access_key` | Yes | AWS secret access key for authentication |
| `aws_bucket` | Yes | Name of the S3 bucket to store vCons |
| `aws_region` | No | AWS region where the bucket is located (e.g., `us-east-1`, `us-west-2`, `eu-west-1`). **Recommended** to avoid "AuthorizationHeaderMalformed" errors when the bucket is in a different region than the default. |
| `endpoint_url` | No | Custom endpoint URL for S3-compatible services like MinIO, LocalStack, or other providers |
| `s3_path` | No | Prefix path for organizing vCon objects within the bucket |

### Region Configuration

**Important:** If your S3 bucket is in a region other than `us-east-1`, you should explicitly set the `aws_region` option. Without this, you may encounter errors like:

```
AuthorizationHeaderMalformed: The authorization header is malformed;
the region 'us-east-1' is wrong; expecting 'us-east-2'
```

## Features

- Object storage
- High availability
- Durability
- Versioning support
- Lifecycle management
- Automatic metrics logging
- Encryption support
- Access control
- Object storage with automatic date-based key organization (`YYYY/MM/DD/uuid.vcon`)
- High availability and durability
- Support for custom S3-compatible endpoints (MinIO, LocalStack, etc.)
- Configurable key prefix for organizing objects
- Automatic error logging

## Usage

Expand All @@ -42,21 +62,24 @@ from storage import Storage
# Initialize S3 storage
s3_storage = Storage("s3")

# Save vCon data
# Save vCon data (retrieves from Redis and stores in S3)
s3_storage.save(vcon_id)

# Retrieve vCon data
vcon_data = s3_storage.get(vcon_id)
```

## Implementation Details
## Key Structure

vCons are stored with keys following this pattern:
```
[s3_path/]YYYY/MM/DD/uuid.vcon
```

The S3 storage implementation:
- Uses boto3 for AWS S3 operations
- Implements retry logic
- Supports multipart uploads
- Provides encryption
- Includes automatic metrics logging
For example, a vCon created on January 15, 2024 with UUID `abc123` and `s3_path: vcons` would be stored at:
```
vcons/2024/01/15/abc123.vcon
```

## Dependencies

Expand All @@ -65,15 +88,11 @@ The S3 storage implementation:

## Best Practices

1. Secure credential management
2. Implement proper access control
3. Use appropriate storage classes
4. Enable versioning
5. Configure lifecycle rules
6. Implement proper error handling
7. Use appropriate encryption
8. Monitor costs
9. Implement retry logic
10. Use appropriate regions
11. Enable logging
12. Regular backup verification
1. Always configure `aws_region` to match your bucket's region
2. Use IAM roles with least-privilege access
3. Enable bucket versioning for data protection
4. Configure lifecycle rules for cost optimization
5. Enable server-side encryption
6. Use VPC endpoints for private connectivity
7. Monitor with CloudWatch metrics
8. Enable access logging for auditing
37 changes: 27 additions & 10 deletions server/storage/s3/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,31 @@
default_options = {}


def _create_s3_client(opts: dict):
"""Create an S3 client with the provided options.

Required options:
aws_access_key_id: AWS access key ID
aws_secret_access_key: AWS secret access key

Optional options:
aws_region: AWS region (e.g., 'us-east-1', 'us-west-2')
endpoint_url: Custom endpoint URL for S3-compatible services
"""
client_kwargs = {
"aws_access_key_id": opts["aws_access_key_id"],
"aws_secret_access_key": opts["aws_secret_access_key"],
}

if opts.get("aws_region"):
client_kwargs["region_name"] = opts["aws_region"]

if opts.get("endpoint_url"):
client_kwargs["endpoint_url"] = opts["endpoint_url"]

return boto3.client("s3", **client_kwargs)


def save(
vcon_uuid,
opts=default_options,
Expand All @@ -19,11 +44,7 @@ def save(
try:
vcon_redis = VconRedis()
vcon = vcon_redis.get_vcon(vcon_uuid)
s3 = boto3.client(
"s3",
aws_access_key_id=opts["aws_access_key_id"],
aws_secret_access_key=opts["aws_secret_access_key"],
)
s3 = _create_s3_client(opts)

s3_path = opts.get("s3_path")
created_at = datetime.fromisoformat(vcon.created_at)
Expand All @@ -45,11 +66,7 @@ def save(
def get(vcon_uuid: str, opts=default_options) -> Optional[dict]:
"""Get a vCon from S3 by UUID."""
try:
s3 = boto3.client(
"s3",
aws_access_key_id=opts["aws_access_key_id"],
aws_secret_access_key=opts["aws_secret_access_key"],
)
s3 = _create_s3_client(opts)

s3_path = opts.get("s3_path", "")
key = f"{s3_path}/{vcon_uuid}.vcon" if s3_path else f"{vcon_uuid}.vcon"
Expand Down
Empty file added tests/storage/__init__.py
Empty file.
Loading
Loading