Add vLLM multimodal vision-language model support to Solaris#473
Draft
dlindenbaum wants to merge 2 commits intomainfrom
Draft
Add vLLM multimodal vision-language model support to Solaris#473dlindenbaum wants to merge 2 commits intomainfrom
dlindenbaum wants to merge 2 commits intomainfrom
Conversation
This commit adds comprehensive support for vision-language models (VLMs) using vLLM, enabling multimodal analysis of satellite and aerial imagery with natural language. Features: - New VLLMInferer class in solaris/nets/vllm_infer.py for multimodal inference - CLI command solaris_vllm_infer for easy command-line usage - Support for any vLLM-compatible vision models (LLaVA, Qwen-VL, etc.) - Batch processing and multi-GPU inference via tensor parallelism - Geospatial metadata preservation (projection, geotransform) - Configurable prompts and sampling parameters - JSON output format for structured results New files: - solaris/nets/vllm_infer.py - Core vLLM inference module - solaris/bin/solaris_vllm_infer.py - CLI entry point - requirements-vllm.txt - vLLM-compatible dependencies (Python 3.8+) - environment-vllm.yml - Conda environment with vLLM support - VLLM_MULTIMODAL_GUIDE.md - Comprehensive usage guide - examples/vllm/ - Example configurations and scripts Modified files: - setup.py - Added solaris_vllm_infer CLI entry point - requirements.txt - Added note about vLLM optional dependencies Use cases: - Generate descriptions of overhead imagery - Land use classification with natural language - Building and infrastructure detection - Change detection and analysis - Visual question answering on satellite images Note: vLLM requires Python 3.8+, PyTorch 2.0+, and CUDA 11.8+ Use environment-vllm.yml or requirements-vllm.txt for installation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Major refactoring to use LiteLLM instead of vLLM, focusing on US-based and NVIDIA vision-language models for multimodal satellite imagery analysis. Key Changes: - Replaced vLLM with LiteLLM for unified API access to multiple providers - Primary focus on Google Gemini (US-based) and NVIDIA models - Simplified deployment - no local GPU required for inference - API-based approach for better scalability and maintenance New Implementation: - solaris/nets/litellm_infer.py - LiteLLM-based multimodal inference * Supports Google Gemini (1.5-flash, 1.5-pro, pro-vision) * Supports NVIDIA vision models via API catalog * Preserves geospatial metadata (projection, geotransform) * Base64 image encoding for API calls * Robust error handling and logging Updated CLI: - solaris/bin/solaris_vllm_infer.py - Updated to use LiteLLM * Help text shows Gemini and NVIDIA model examples * Environment variable support (GEMINI_API_KEY, NVIDIA_API_KEY) * Maintains backward compatibility with existing command name Dependencies: - requirements-litellm.txt - LiteLLM and Google Generative AI packages - environment-litellm.yml - Conda environment for LiteLLM - Lighter weight than vLLM (no local model hosting required) Documentation: - LITELLM_MULTIMODAL_GUIDE.md - Comprehensive guide for LiteLLM usage * Google Gemini setup instructions * API key configuration * Model comparison (Flash vs Pro) * Cost estimation and optimization tips * Security and privacy considerations Example Configurations: - examples/vllm/litellm_config_gemini.yml - Gemini Flash (fast, efficient) - examples/vllm/litellm_config_gemini_pro.yml - Gemini Pro (most capable) - examples/vllm/litellm_config_nvidia.yml - NVIDIA models template - examples/vllm/example_litellm_inference.py - Python API examples Removed Files: - Old vLLM-specific files (vllm_infer.py, vllm configs, vllm guide) - No longer needed with API-based approach Benefits: - US-based models for security and compliance - No local GPU/CUDA requirements for inference - Production-ready APIs from Google and NVIDIA - Cost-effective (especially Gemini Flash) - Easier deployment and scaling - Regular model updates from providers Supported Use Cases: - Land use classification - Building and infrastructure detection - Environmental monitoring - Change detection analysis - General overhead imagery description API Key Setup: export GEMINI_API_KEY='your-key' # Get from https://makersuite.google.com/app/apikey export NVIDIA_API_KEY='your-key' # Get from https://build.nvidia.com/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit adds comprehensive support for vision-language models (VLMs) using vLLM, enabling multimodal analysis of satellite and aerial imagery with natural language.
Features:
New files:
Modified files:
Use cases:
Note: vLLM requires Python 3.8+, PyTorch 2.0+, and CUDA 11.8+ Use environment-vllm.yml or requirements-vllm.txt for installation.
🤖 Generated with Claude Code
Thank you for submitting your PR. Please read the template below, fill it out as appropriate, and make additional changes to your code as needed. Please feel free to submit your PR even if it doesn't satisfy all of the requirements below - simply prepend [WIP] to the PR title until it is ready for review by a maintainer. If you need assistance or review from a maintainer, add the label Status: Help Needed or Status: Review Needed respectively. After review, a maintainer will add the label Status: Revision Needed if further work is required for the PR to be merged.
Description
Please include a summary of the change and which issue is resolved. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe tests that you added to the pytest codebase (if applicable).
Checklist:
If your PR does not fulfill all of the requirements in the checklist above, that's OK! Just prepend [WIP] to the PR title until they are all satisfied. If you need help, @-mention a maintainer and/or add the Status: Help Needed label.