feat: Add multimodal image support by nursnaaz · Pull Request #46 · aws/nova-prompt-optimizer

nursnaaz · 2025-12-05T01:08:25Z

Add Multimodal Image Support for Bedrock Converse API

Summary

This PR adds automatic image loading and multimodal support to Nova Prompt Optimizer, enabling prompt optimization for vision tasks like image classification, OCR, watermark detection, and visual question answering.

Problem Statement

Nova Prompt Optimizer previously only supported text-based prompts, limiting its use for multimodal models that can process images. Users working with vision tasks had no way to:

Optimize prompts that include images
Evaluate model performance on image-based datasets
Use MIPROv2 optimization with multimodal inputs

Solution

Added automatic image detection and loading in the Bedrock Converse handler:

Automatic Image Detection: Detects image paths in prompts using pattern matching
Image Loading: Loads images from local filesystem or URLs
Bedrock Integration: Formats images correctly for Bedrock Converse API
MIPROv2 Support: Preserves images during optimization via ImageAwareLM wrapper
Backward Compatible: Text-only workflows completely unchanged

Changes

Core Files Modified

1. `bedrock_converse.py` - Image Loading & Processing

Added IMAGE_SUPPORT_AVAILABLE flag for graceful degradation
Added enable_image_support parameter to BedrockConverseHandler
Implemented _process_multimodal_content() for image detection and loading
Modified _get_messages() to handle both text and multimodal content
Supports multiple image path patterns (explicit markers, direct paths, URLs, MIPROv2 format)
Preserves template variables like {input} without treating as file paths

Key Features:

Lazy loading: Only processes images when patterns detected
Format support: JPEG, PNG, GIF, WebP
Error handling: Falls back to text on image loading failure
Performance: No overhead for text-only prompts (< 0.001ms per message)

2. `image_aware_lm.py` - MIPROv2 Integration (NEW)

Created ImageAwareLM wrapper for DSPy language models
Intercepts prompts to detect and load images
Calls Bedrock Converse API directly with multimodal content
Delegates text-only prompts to base LM (backward compatible)
Prevents infinite recursion with _is_processing_image flag

3. `miprov2_optimizer.py` - Optimizer Integration

Updated _create_image_aware_lm() to use ImageAwareLM
Ensures images preserved during MIPROv2 optimization
Maintains compatibility with text-only optimization

Tests Added

1. `test_bedrock_converse_compatibility.py`

Tests text-only backward compatibility
Tests template variable handling
Tests multimodal with images
Tests multi-turn conversations
Tests MIPROv2 format
Tests feature flag control

Results: 6/6 tests passed ✅

2. `test_comprehensive_validation.py`

Message formatting validation
Image path detection logic
Real API calls (text-only)
Real API calls (multimodal)
Feature flag control
Performance benchmarks

Results: 6/6 tests passed ✅

3. `test_miprov2_integration.py`

ImageAwareLM initialization
Text-only delegation
Image path extraction
Image loading
Real Bedrock API calls with images
Recursion prevention

Results: 6/6 tests passed ✅

Documentation Added

docs/MULTIMODAL_SUPPORT.md - Comprehensive usage guide
Inline code documentation and docstrings
Examples for common use cases

Testing

Test Coverage

18 automated tests covering all scenarios
100% pass rate across all test suites
Tests run against real Bedrock API (not mocked)

Test Scenarios Validated

✅ Backward Compatibility

Text-only prompts work unchanged
Multi-turn conversations preserved
Template variables handled correctly
No breaking changes to existing functionality

✅ Multimodal Functionality

Images detected and loaded correctly
Multiple format support (JPEG, PNG, etc.)
Local files and URLs both work
MIPROv2 format supported
Real Bedrock API calls succeed

✅ Edge Cases

Missing images handled gracefully
Template variables not treated as paths
Feature flag disables image support
PIL not installed (graceful degradation)
Recursion prevention works

✅ Performance

No overhead for text-only (< 0.001ms per message)
1000 text messages formatted in 0.001s
Image loading only when needed

Backward Compatibility

100% backward compatible - All existing functionality preserved:

Text-only prompts work exactly as before
No changes to public APIs
No new required dependencies (PIL/requests optional)
Feature can be disabled via enable_image_support=False
Graceful degradation if dependencies not installed

Usage Example

Before (Text-only)

prompt_adapter.set_user_prompt(
    content="Classify this text: {input}",
    variables={"input"}
)

After (Multimodal)

prompt_adapter.set_user_prompt(
    content="Analyze this image for watermarks: {input}",
    variables={"input"}
)

# Dataset with image paths
dataset = [
    {"input": "images/photo1.jpg", "output": "Watermark detected"},
    {"input": "images/photo2.jpg", "output": "No watermark"}
]

# Images automatically loaded and sent to Bedrock!

Dependencies

Optional Dependencies (for image support)

pip install Pillow requests

If not installed:

Logs informational message
Falls back to text-only mode
No errors or failures

Performance Impact

Text-only workflows: Zero impact (< 0.001ms overhead)
Image detection: Only runs when image patterns present
Image loading: Lazy loading on demand
Memory: Images loaded per-call, not cached globally

Security Considerations

File path validation to prevent directory traversal
URL timeout (30s) to prevent hanging
Error handling for malformed images
No arbitrary code execution

Breaking Changes

None - This is a purely additive feature.

Migration Guide

No migration needed! Existing code works unchanged.

To use new multimodal features:

Install optional dependencies: pip install Pillow requests
Use image paths in your prompts
That's it! Images are automatically detected and loaded

Checklist

Test Results

================================================================================
FINAL VALIDATION RESULTS
================================================================================
Message Formatting       : ✅ PASS
Image Detection          : ✅ PASS
API Text-Only            : ✅ PASS
API Multimodal           : ✅ PASS
Feature Flag             : ✅ PASS
Performance              : ✅ PASS

Total: 6 passed, 0 failed, 0 skipped
================================================================================

✅ ALL VALIDATION TESTS PASSED!

MIPROv2 Integration Results

================================================================================
MIPROV2 INTEGRATION TEST RESULTS
================================================================================
Initialization           : ✅ PASS
Text Delegation          : ✅ PASS
Path Extraction          : ✅ PASS
Image Loading            : ✅ PASS
Real Bedrock Call        : ✅ PASS
Recursion Prevention     : ✅ PASS

Total: 6 passed, 0 failed, 0 skipped
================================================================================

✅ ALL MIPROV2 INTEGRATION TESTS PASSED!

Reviewers

@[maintainer1] @[maintainer2]

Additional Notes

This feature has been extensively tested with:

Amazon Nova Lite, Pro, and Premier models
Real-world watermark detection use case (169 images)
Both local files and remote URLs
Various image formats and sizes

Ready for production use! 🚀

- Auto-detect and load images from prompts - Support local files, URLs, and multiple formats (JPEG, PNG, GIF, WebP) - Preserve images during MIPROv2 optimization via ImageAwareLM - Fully backward compatible with text-only workflows - Add comprehensive test suite (18 tests, 100% pass rate) - Add detailed documentation and usage examples Key Changes: - bedrock_converse.py: Image detection and loading - image_aware_lm.py: MIPROv2 integration wrapper - miprov2_optimizer.py: Use image-aware LM - adapter.py: Proxy client support (optional) - bedrock_adapter_lm.py: Direct Bedrock adapter Features: - Automatic image path detection with pattern matching - Template variable preservation ({input} not treated as path) - MIPROv2 format support ([][path]) - Feature flag for disabling image support - Graceful degradation without PIL/requests - Zero performance impact on text-only prompts Tests: - test_bedrock_converse_compatibility.py: Backward compatibility - test_comprehensive_validation.py: Full validation suite - test_miprov2_integration.py: MIPROv2 optimization tests All tests validated against real Bedrock API with Nova models.

ericgaoyh

Left some comments.

ericgaoyh · 2025-12-11T20:03:26Z

src/amzn_nova_prompt_optimizer/core/inference/adapter.py

+        # Check if using Bedrock Proxy
+        if os.environ.get('BEDROCK_PROXY_ENDPOINT'):
+            # Import proxy client dynamically
+            try:
+                import sys
+                from pathlib import Path
+                # Try multiple possible locations for bedrock_proxy
+                possible_paths = [
+                    Path.cwd() / 'bedrock_proxy',  # Current working directory
+                    Path.cwd() / 'Optimizer-Try' / 'bedrock_proxy',  # From workspace root
+                    Path(__file__).parent.parent.parent.parent.parent / 'Optimizer-Try' / 'bedrock_proxy',  # Relative to this file
+                ]
+
+                proxy_path = None
+                for path in possible_paths:
+                    if path.exists() and (path / 'bedrock_proxy_client.py').exists():
+                        proxy_path = path
+                        break
+
+                if not proxy_path:
+                    raise ImportError(f"Could not find bedrock_proxy_client.py in any of: {possible_paths}")
+
+                if str(proxy_path) not in sys.path:
+                    sys.path.insert(0, str(proxy_path))
+
+                from bedrock_proxy_client import create_proxy_client
+                self.bedrock_client = create_proxy_client()
+                logger.info(f"✅ Using Bedrock Proxy Client from {proxy_path}")
+            except ImportError as e:
+                logger.error(f"Failed to import bedrock_proxy_client: {e}")
+                raise


What's the purpose of this bedrock proxy client and endpoint? It seems dynamically loading bedrock client from bedrock_proxy_client.py file.

ericgaoyh · 2025-12-11T20:06:28Z

src/amzn_nova_prompt_optimizer/core/inference/bedrock_converse.py


 class BedrockConverseHandler:
-    def __init__(self, bedrock_client):
+    def __init__(self, bedrock_client, enable_image_support=True):


nit: I prefer set default value of enable_image_support to False and user should manually specify it to True if they want to enable image support as an add-on.

ericgaoyh · 2025-12-11T20:13:43Z

src/amzn_nova_prompt_optimizer/core/inference/bedrock_converse.py

+                )
+
+                if might_have_image:
+                    logger.debug(f"Processing potential multimodal content: {user_content[:100]}...")


nit: I prefer we either directly show full user_content in the debug log or simply not show it. Truncating only first 100 element might makes confusion.

ericgaoyh · 2025-12-11T20:31:49Z

src/amzn_nova_prompt_optimizer/core/inference/bedrock_converse.py

+        # Check if it's a template variable (skip image processing)
+        is_template = (
+            stripped.startswith('[[ ##') or
+            stripped in ['[input]', '{input}', '{{input}}', '[[input]]'] or


I think this will check if stripped is one of the ['[input]', '{input}', '{{input}}', '[[input]]'] (e.g. stripped = '[input]'). Rather than checking if [input] or other pattern in stripped.
But I guess you actually want to check the 2nd scenario right? In that case, I think we should do something like:

patterns = ['[input]', '{input}', '{{input}}', '[[input]]'] stripped = "Analyze this image for watermarks: {input}" result = any(pattern in stripped for pattern in patterns)

nursnaaz requested a review from a team as a code owner December 5, 2025 01:08

ericgaoyh requested changes Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add multimodal image support#46

feat: Add multimodal image support#46
nursnaaz wants to merge 1 commit intoaws:mainfrom
nursnaaz:feature/multimodal-image-support

nursnaaz commented Dec 5, 2025 •

edited

Loading

Uh oh!

ericgaoyh left a comment

Uh oh!

ericgaoyh Dec 11, 2025

Uh oh!

ericgaoyh Dec 11, 2025

Uh oh!

ericgaoyh Dec 11, 2025

Uh oh!

ericgaoyh Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nursnaaz commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Multimodal Image Support for Bedrock Converse API

Summary

Problem Statement

Solution

Changes

Core Files Modified

1. bedrock_converse.py - Image Loading & Processing

2. image_aware_lm.py - MIPROv2 Integration (NEW)

3. miprov2_optimizer.py - Optimizer Integration

Tests Added

1. test_bedrock_converse_compatibility.py

2. test_comprehensive_validation.py

3. test_miprov2_integration.py

Documentation Added

Testing

Test Coverage

Test Scenarios Validated

Backward Compatibility

Usage Example

Before (Text-only)

After (Multimodal)

Dependencies

Optional Dependencies (for image support)

Performance Impact

Security Considerations

Breaking Changes

Migration Guide

Checklist

Test Results

MIPROv2 Integration Results

Reviewers

Additional Notes

Uh oh!

ericgaoyh left a comment

Choose a reason for hiding this comment

Uh oh!

ericgaoyh Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

ericgaoyh Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

ericgaoyh Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

ericgaoyh Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nursnaaz commented Dec 5, 2025 •

edited

Loading

1. `bedrock_converse.py` - Image Loading & Processing

2. `image_aware_lm.py` - MIPROv2 Integration (NEW)

3. `miprov2_optimizer.py` - Optimizer Integration

1. `test_bedrock_converse_compatibility.py`

2. `test_comprehensive_validation.py`

3. `test_miprov2_integration.py`