Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 56 additions & 5 deletions server/links/diet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ The Diet link is a specialized plugin that helps reduce the size and content of
## Features

- Selective removal of dialog body content
- Optional media redirection to external storage
- Optional media redirection to external storage (HTTP endpoint or S3)
- S3 storage with presigned URL generation for secure access
- Removal of analysis data
- Filtering of attachments by MIME type
- Removal of system prompts to prevent LLM instruction injection
Expand All @@ -20,6 +21,13 @@ default_options = {
"remove_analysis": False, # Remove all analysis data
"remove_attachment_types": [], # List of attachment types to remove (e.g., ["image/jpeg", "audio/mp3"])
"remove_system_prompts": False, # Remove system_prompt keys to prevent LLM instruction insertion
# S3 storage options for dialog bodies
"s3_bucket": "", # S3 bucket name for storing dialog bodies
"s3_path": "", # Optional path prefix within the bucket
"aws_access_key_id": "", # AWS access key ID
"aws_secret_access_key": "", # AWS secret access key
"aws_region": "us-east-1", # AWS region (default: us-east-1)
"presigned_url_expiration": None, # Presigned URL expiration in seconds (None = default 1 hour)
}
```

Expand All @@ -31,6 +39,15 @@ default_options = {
- `remove_attachment_types`: List of MIME types to remove from attachments
- `remove_system_prompts`: Whether to remove system_prompt keys to prevent LLM instruction injection

### S3 Storage Options

- `s3_bucket`: The S3 bucket name where dialog bodies will be stored
- `s3_path`: Optional path prefix within the bucket (e.g., "dialogs/processed")
- `aws_access_key_id`: AWS access key ID for authentication
- `aws_secret_access_key`: AWS secret access key for authentication
- `aws_region`: AWS region where the bucket is located (default: "us-east-1")
- `presigned_url_expiration`: Expiration time in seconds for presigned URLs (optional, defaults to 3600 seconds / 1 hour)

## Usage

The link processes vCons by:
Expand All @@ -42,9 +59,39 @@ The link processes vCons by:
- Removing system prompts if specified
3. Storing the modified vCon back in Redis

## Media Redirection
## Media Storage Options

The diet link supports two methods for storing dialog bodies externally:

### S3 Storage (Recommended)

When `s3_bucket` is configured, the link will:
1. Upload dialog body content to the specified S3 bucket
2. Generate a presigned URL for secure access
3. Replace the body content with the presigned URL
4. Set the body_type to "url"
5. If the upload fails, the body content will be removed

**S3 takes precedence over HTTP endpoint** - if both `s3_bucket` and `post_media_to_url` are configured, S3 will be used.

Example S3 configuration:
```python
{
"remove_dialog_body": True,
"s3_bucket": "my-vcon-storage",
"s3_path": "dialogs/archived",
"aws_access_key_id": "AKIAXXXXXXXX",
"aws_secret_access_key": "xxxxxxxxxxxxx",
"aws_region": "us-west-2",
"presigned_url_expiration": 86400, # 24 hours
}
```

The S3 key structure is: `{s3_path}/{vcon_uuid}/{dialog_id}_{unique_id}.txt`

### HTTP Endpoint Storage

When `post_media_to_url` is configured, the link will:
When `post_media_to_url` is configured (and `s3_bucket` is not), the link will:
1. Post the media content to the specified URL
2. Replace the body content with the URL to the stored content
3. Set the body_type to "url"
Expand All @@ -59,12 +106,16 @@ When `post_media_to_url` is configured, the link will:
## Dependencies

- Redis for vCon storage
- Requests library for media redirection
- Requests library for HTTP media redirection
- boto3 library for S3 storage
- Custom utilities:
- logging_utils

## Requirements

- Redis connection must be configured
- Appropriate permissions for vCon access and storage
- If using media redirection, a valid endpoint URL must be provided
- If using HTTP media redirection, a valid endpoint URL must be provided
- If using S3 storage:
- Valid AWS credentials with write access to the specified bucket
- The bucket must exist and be accessible
129 changes: 126 additions & 3 deletions server/links/diet/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,34 @@
from lib.logging_utils import init_logger
import json
import requests
import uuid
import boto3
from botocore.exceptions import ClientError
from typing import Dict, List, Any, Optional

logger = init_logger(__name__)
logger.info("MDO THIS SHOULD PRINT")

_REDACTED = "[REDACTED]"


def _redact_option_value(key: str, value: Any) -> Any:
"""
Redact sensitive option values before logging.

This prevents leaking secrets (for example AWS credentials) into logs.
"""
key_l = (key or "").lower()
if (
key_l == "aws_secret_access_key"
or "secret" in key_l
or "password" in key_l
or "token" in key_l
or key_l.endswith("_secret")
):
return _REDACTED
return value


# Default options that control which elements to remove
default_options = {
Expand All @@ -15,16 +38,101 @@
"remove_analysis": False, # Remove all analysis data
"remove_attachment_types": [], # List of attachment types to remove (e.g., ["image/jpeg", "audio/mp3"])
"remove_system_prompts": False, # Remove system_prompt keys to prevent LLM instruction insertion
# S3 storage options for dialog bodies
"s3_bucket": "", # S3 bucket name for storing dialog bodies
"s3_path": "", # Optional path prefix within the bucket
"aws_access_key_id": "", # AWS access key ID
"aws_secret_access_key": "", # AWS secret access key
"aws_region": "us-east-1", # AWS region (default: us-east-1)
"presigned_url_expiration": None, # Presigned URL expiration in seconds (None = no expiration/default 1 hour)
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "None = no expiration/default 1 hour" which is misleading. According to the implementation (lines 86-89), None doesn't mean "no expiration" - it means the default of 3600 seconds (1 hour) will be used. The comment should be clarified to say "None = default 1 hour" or "optional, defaults to 3600 seconds (1 hour)".

Suggested change
"presigned_url_expiration": None, # Presigned URL expiration in seconds (None = no expiration/default 1 hour)
"presigned_url_expiration": None, # Presigned URL expiration in seconds (optional, defaults to 3600 seconds [1 hour] if None)

Copilot uses AI. Check for mistakes.
}


def _get_s3_client(options: Dict[str, Any]):
"""Create and return an S3 client with the provided credentials."""
return boto3.client(
"s3",
aws_access_key_id=options["aws_access_key_id"],
aws_secret_access_key=options["aws_secret_access_key"],
region_name=options.get("aws_region", "us-east-1"),
)
Comment on lines +51 to +58
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing input validation for required S3 configuration parameters. When s3_bucket is configured, the code should validate that required credentials (aws_access_key_id and aws_secret_access_key) are also provided before attempting to create the S3 client. Currently, the code will fail with a less helpful error if these are missing or empty.

Copilot uses AI. Check for mistakes.


def _upload_to_s3_and_get_presigned_url(
content: str,
vcon_uuid: str,
dialog_id: str,
options: Dict[str, Any]
) -> Optional[str]:
"""
Upload dialog body content to S3 and return a presigned URL.

Args:
content: The dialog body content to upload
vcon_uuid: The vCon UUID
dialog_id: The dialog ID
options: Configuration options including S3 credentials and bucket info

Returns:
Presigned URL to access the uploaded content, or None if upload fails
"""
try:
s3 = _get_s3_client(options)

# Generate a unique key for this dialog body
unique_id = str(uuid.uuid4())
key = f"{dialog_id}_{unique_id}.txt" if dialog_id else f"{unique_id}.txt"

# Add vcon_uuid as a directory level
key = f"{vcon_uuid}/{key}"

# Add optional path prefix
if options.get("s3_path"):
key = f"{options['s3_path']}/{key}"

bucket = options["s3_bucket"]

# Upload the content
s3.put_object(
Bucket=bucket,
Key=key,
Body=content.encode("utf-8") if isinstance(content, str) else content,
ContentType="text/plain",
Comment on lines +96 to +100
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content type is hardcoded as "text/plain" for all S3 uploads. Dialog bodies may contain various content types (JSON, XML, etc.). Consider making the content type configurable or inferring it from the dialog body content to ensure proper handling by consumers of the presigned URLs.

Copilot uses AI. Check for mistakes.
)

logger.info(f"Successfully uploaded dialog body to s3://{bucket}/{key}")

# Generate presigned URL
expiration = options.get("presigned_url_expiration")
if expiration is None:
# Default to 1 hour (3600 seconds) if not specified
expiration = 3600

presigned_url = s3.generate_presigned_url(
"get_object",
Params={"Bucket": bucket, "Key": key},
ExpiresIn=expiration,
)
Comment on lines +106 to +115
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presigned_url_expiration value is not validated. According to AWS documentation, ExpiresIn must be a positive integer and has limits (typically up to 7 days for standard presigned URLs). Consider validating that the expiration value is within acceptable bounds to fail fast with a clear error message rather than letting boto3 raise an exception.

Copilot uses AI. Check for mistakes.

logger.info(f"Generated presigned URL with expiration {expiration}s")
return presigned_url

except ClientError as e:
logger.error(f"S3 client error uploading dialog body: {e}")
return None
except Exception as e:
logger.error(f"Exception uploading dialog body to S3: {e}")
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad Exception catch at line 103 may hide important errors. While catching ClientError is appropriate for S3-specific errors, consider being more specific about what other exceptions are expected here. At minimum, log the exception type to aid in debugging unexpected failures.

Suggested change
logger.error(f"Exception uploading dialog body to S3: {e}")
logger.exception(f"Unexpected exception uploading dialog body to S3: {type(e).__name__}: {e}")

Copilot uses AI. Check for mistakes.
return None


def run(vcon_uuid, link_name, opts=default_options):
logger.info("Starting diet::run")

# Merge provided options with defaults
options = {**default_options, **opts}

for key, value in options.items():
logger.info(f"diet::{key}: {value}")
logger.info("diet::%s: %s", key, _redact_option_value(key, value))

# Load vCon from Redis using JSON.GET
vcon = redis.json().get(f"vcon:{vcon_uuid}")
Expand All @@ -41,12 +149,27 @@ def run(vcon_uuid, link_name, opts=default_options):
logger.info("diet::got dialog")
if options["remove_dialog_body"] and "body" in dialog:
logger.info("diet::remove_dialog_body AND body")
if options["post_media_to_url"] and dialog.get("body"):
dialog_body = dialog.get("body")
dialog_id = dialog.get("id", "")

# Check if S3 storage is configured
if options.get("s3_bucket") and dialog_body:
logger.info("diet::uploading to S3")
presigned_url = _upload_to_s3_and_get_presigned_url(
dialog_body, vcon_uuid, dialog_id, options
)
if presigned_url:
dialog["body"] = presigned_url
dialog["body_type"] = "url"
else:
logger.error("Failed to upload to S3, removing body")
dialog["body"] = ""
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When S3 upload fails and the body is cleared, the body_type field is not updated or removed. This could leave dialogs with body_type="url" but an empty body, which is an inconsistent state. Consider either removing the body_type field or setting it to an appropriate value (like an empty string) when the upload fails.

Suggested change
dialog["body"] = ""
dialog["body"] = ""
dialog["body_type"] = ""

Copilot uses AI. Check for mistakes.
elif options["post_media_to_url"] and dialog_body:
try:
# Post the body content to the specified URL
response = requests.post(
options["post_media_to_url"],
json={"content": dialog["body"], "vcon_uuid": vcon_uuid, "dialog_id": dialog.get("id", "")}
json={"content": dialog_body, "vcon_uuid": vcon_uuid, "dialog_id": dialog_id}
)
if response.status_code == 200:
# Replace body with the URL to the stored content
Expand Down
Loading