Conversation
- Auto-detect and load images from prompts
- Support local files, URLs, and multiple formats (JPEG, PNG, GIF, WebP)
- Preserve images during MIPROv2 optimization via ImageAwareLM
- Fully backward compatible with text-only workflows
- Add comprehensive test suite (18 tests, 100% pass rate)
- Add detailed documentation and usage examples
Key Changes:
- bedrock_converse.py: Image detection and loading
- image_aware_lm.py: MIPROv2 integration wrapper
- miprov2_optimizer.py: Use image-aware LM
- adapter.py: Proxy client support (optional)
- bedrock_adapter_lm.py: Direct Bedrock adapter
Features:
- Automatic image path detection with pattern matching
- Template variable preservation ({input} not treated as path)
- MIPROv2 format support ([][path])
- Feature flag for disabling image support
- Graceful degradation without PIL/requests
- Zero performance impact on text-only prompts
Tests:
- test_bedrock_converse_compatibility.py: Backward compatibility
- test_comprehensive_validation.py: Full validation suite
- test_miprov2_integration.py: MIPROv2 optimization tests
All tests validated against real Bedrock API with Nova models.
| # Check if using Bedrock Proxy | ||
| if os.environ.get('BEDROCK_PROXY_ENDPOINT'): | ||
| # Import proxy client dynamically | ||
| try: | ||
| import sys | ||
| from pathlib import Path | ||
| # Try multiple possible locations for bedrock_proxy | ||
| possible_paths = [ | ||
| Path.cwd() / 'bedrock_proxy', # Current working directory | ||
| Path.cwd() / 'Optimizer-Try' / 'bedrock_proxy', # From workspace root | ||
| Path(__file__).parent.parent.parent.parent.parent / 'Optimizer-Try' / 'bedrock_proxy', # Relative to this file | ||
| ] | ||
|
|
||
| proxy_path = None | ||
| for path in possible_paths: | ||
| if path.exists() and (path / 'bedrock_proxy_client.py').exists(): | ||
| proxy_path = path | ||
| break | ||
|
|
||
| if not proxy_path: | ||
| raise ImportError(f"Could not find bedrock_proxy_client.py in any of: {possible_paths}") | ||
|
|
||
| if str(proxy_path) not in sys.path: | ||
| sys.path.insert(0, str(proxy_path)) | ||
|
|
||
| from bedrock_proxy_client import create_proxy_client | ||
| self.bedrock_client = create_proxy_client() | ||
| logger.info(f"✅ Using Bedrock Proxy Client from {proxy_path}") | ||
| except ImportError as e: | ||
| logger.error(f"Failed to import bedrock_proxy_client: {e}") | ||
| raise |
There was a problem hiding this comment.
What's the purpose of this bedrock proxy client and endpoint? It seems dynamically loading bedrock client from bedrock_proxy_client.py file.
|
|
||
| class BedrockConverseHandler: | ||
| def __init__(self, bedrock_client): | ||
| def __init__(self, bedrock_client, enable_image_support=True): |
There was a problem hiding this comment.
nit: I prefer set default value of enable_image_support to False and user should manually specify it to True if they want to enable image support as an add-on.
| ) | ||
|
|
||
| if might_have_image: | ||
| logger.debug(f"Processing potential multimodal content: {user_content[:100]}...") |
There was a problem hiding this comment.
nit: I prefer we either directly show full user_content in the debug log or simply not show it. Truncating only first 100 element might makes confusion.
| # Check if it's a template variable (skip image processing) | ||
| is_template = ( | ||
| stripped.startswith('[[ ##') or | ||
| stripped in ['[input]', '{input}', '{{input}}', '[[input]]'] or |
There was a problem hiding this comment.
I think this will check if stripped is one of the ['[input]', '{input}', '{{input}}', '[[input]]'] (e.g. stripped = '[input]'). Rather than checking if [input] or other pattern in stripped.
But I guess you actually want to check the 2nd scenario right? In that case, I think we should do something like:
patterns = ['[input]', '{input}', '{{input}}', '[[input]]']
stripped = "Analyze this image for watermarks: {input}"
result = any(pattern in stripped for pattern in patterns)
Add Multimodal Image Support for Bedrock Converse API
Summary
This PR adds automatic image loading and multimodal support to Nova Prompt Optimizer, enabling prompt optimization for vision tasks like image classification, OCR, watermark detection, and visual question answering.
Problem Statement
Nova Prompt Optimizer previously only supported text-based prompts, limiting its use for multimodal models that can process images. Users working with vision tasks had no way to:
Solution
Added automatic image detection and loading in the Bedrock Converse handler:
Changes
Core Files Modified
1.
bedrock_converse.py- Image Loading & ProcessingIMAGE_SUPPORT_AVAILABLEflag for graceful degradationenable_image_supportparameter toBedrockConverseHandler_process_multimodal_content()for image detection and loading_get_messages()to handle both text and multimodal content{input}without treating as file pathsKey Features:
2.
image_aware_lm.py- MIPROv2 Integration (NEW)ImageAwareLMwrapper for DSPy language models_is_processing_imageflag3.
miprov2_optimizer.py- Optimizer Integration_create_image_aware_lm()to useImageAwareLMTests Added
1.
test_bedrock_converse_compatibility.pyResults: 6/6 tests passed ✅
2.
test_comprehensive_validation.pyResults: 6/6 tests passed ✅
3.
test_miprov2_integration.pyResults: 6/6 tests passed ✅
Documentation Added
docs/MULTIMODAL_SUPPORT.md- Comprehensive usage guideTesting
Test Coverage
Test Scenarios Validated
✅ Backward Compatibility
✅ Multimodal Functionality
✅ Edge Cases
✅ Performance
Backward Compatibility
100% backward compatible - All existing functionality preserved:
enable_image_support=FalseUsage Example
Before (Text-only)
After (Multimodal)
Dependencies
Optional Dependencies (for image support)
If not installed:
Performance Impact
Security Considerations
Breaking Changes
None - This is a purely additive feature.
Migration Guide
No migration needed! Existing code works unchanged.
To use new multimodal features:
pip install Pillow requestsChecklist
Test Results
MIPROv2 Integration Results
Reviewers
@[maintainer1] @[maintainer2]
Additional Notes
This feature has been extensively tested with:
Ready for production use! 🚀