Add vLLM multimodal vision-language model support to Solaris by dlindenbaum · Pull Request #473 · CosmiQ/solaris

dlindenbaum · 2025-10-21T02:25:06Z

This commit adds comprehensive support for vision-language models (VLMs) using vLLM, enabling multimodal analysis of satellite and aerial imagery with natural language.

Features:

New VLLMInferer class in solaris/nets/vllm_infer.py for multimodal inference
CLI command solaris_vllm_infer for easy command-line usage
Support for any vLLM-compatible vision models (LLaVA, Qwen-VL, etc.)
Batch processing and multi-GPU inference via tensor parallelism
Geospatial metadata preservation (projection, geotransform)
Configurable prompts and sampling parameters
JSON output format for structured results

New files:

solaris/nets/vllm_infer.py - Core vLLM inference module
solaris/bin/solaris_vllm_infer.py - CLI entry point
requirements-vllm.txt - vLLM-compatible dependencies (Python 3.8+)
environment-vllm.yml - Conda environment with vLLM support
VLLM_MULTIMODAL_GUIDE.md - Comprehensive usage guide
examples/vllm/ - Example configurations and scripts

Modified files:

setup.py - Added solaris_vllm_infer CLI entry point
requirements.txt - Added note about vLLM optional dependencies

Use cases:

Generate descriptions of overhead imagery
Land use classification with natural language
Building and infrastructure detection
Change detection and analysis
Visual question answering on satellite images

Note: vLLM requires Python 3.8+, PyTorch 2.0+, and CUDA 11.8+ Use environment-vllm.yml or requirements-vllm.txt for installation.

🤖 Generated with Claude Code

Thank you for submitting your PR. Please read the template below, fill it out as appropriate, and make additional changes to your code as needed. Please feel free to submit your PR even if it doesn't satisfy all of the requirements below - simply prepend [WIP] to the PR title until it is ready for review by a maintainer. If you need assistance or review from a maintainer, add the label Status: Help Needed or Status: Review Needed respectively. After review, a maintainer will add the label Status: Revision Needed if further work is required for the PR to be merged.

Description

Please include a summary of the change and which issue is resolved. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected - these changes will not be merged until major releases!)

How Has This Been Tested?

Please describe tests that you added to the pytest codebase (if applicable).

Checklist:

My PR has a descriptive title
My code follows PEP8
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new errors
I have added tests that prove my fix is effective or that my feature works
My PR passes Travis CI tests
My PR does not reduce coverage in Codecov

If your PR does not fulfill all of the requirements in the checklist above, that's OK! Just prepend [WIP] to the PR title until they are all satisfied. If you need help, @-mention a maintainer and/or add the Status: Help Needed label.

This commit adds comprehensive support for vision-language models (VLMs) using vLLM, enabling multimodal analysis of satellite and aerial imagery with natural language. Features: - New VLLMInferer class in solaris/nets/vllm_infer.py for multimodal inference - CLI command solaris_vllm_infer for easy command-line usage - Support for any vLLM-compatible vision models (LLaVA, Qwen-VL, etc.) - Batch processing and multi-GPU inference via tensor parallelism - Geospatial metadata preservation (projection, geotransform) - Configurable prompts and sampling parameters - JSON output format for structured results New files: - solaris/nets/vllm_infer.py - Core vLLM inference module - solaris/bin/solaris_vllm_infer.py - CLI entry point - requirements-vllm.txt - vLLM-compatible dependencies (Python 3.8+) - environment-vllm.yml - Conda environment with vLLM support - VLLM_MULTIMODAL_GUIDE.md - Comprehensive usage guide - examples/vllm/ - Example configurations and scripts Modified files: - setup.py - Added solaris_vllm_infer CLI entry point - requirements.txt - Added note about vLLM optional dependencies Use cases: - Generate descriptions of overhead imagery - Land use classification with natural language - Building and infrastructure detection - Change detection and analysis - Visual question answering on satellite images Note: vLLM requires Python 3.8+, PyTorch 2.0+, and CUDA 11.8+ Use environment-vllm.yml or requirements-vllm.txt for installation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Major refactoring to use LiteLLM instead of vLLM, focusing on US-based and NVIDIA vision-language models for multimodal satellite imagery analysis. Key Changes: - Replaced vLLM with LiteLLM for unified API access to multiple providers - Primary focus on Google Gemini (US-based) and NVIDIA models - Simplified deployment - no local GPU required for inference - API-based approach for better scalability and maintenance New Implementation: - solaris/nets/litellm_infer.py - LiteLLM-based multimodal inference * Supports Google Gemini (1.5-flash, 1.5-pro, pro-vision) * Supports NVIDIA vision models via API catalog * Preserves geospatial metadata (projection, geotransform) * Base64 image encoding for API calls * Robust error handling and logging Updated CLI: - solaris/bin/solaris_vllm_infer.py - Updated to use LiteLLM * Help text shows Gemini and NVIDIA model examples * Environment variable support (GEMINI_API_KEY, NVIDIA_API_KEY) * Maintains backward compatibility with existing command name Dependencies: - requirements-litellm.txt - LiteLLM and Google Generative AI packages - environment-litellm.yml - Conda environment for LiteLLM - Lighter weight than vLLM (no local model hosting required) Documentation: - LITELLM_MULTIMODAL_GUIDE.md - Comprehensive guide for LiteLLM usage * Google Gemini setup instructions * API key configuration * Model comparison (Flash vs Pro) * Cost estimation and optimization tips * Security and privacy considerations Example Configurations: - examples/vllm/litellm_config_gemini.yml - Gemini Flash (fast, efficient) - examples/vllm/litellm_config_gemini_pro.yml - Gemini Pro (most capable) - examples/vllm/litellm_config_nvidia.yml - NVIDIA models template - examples/vllm/example_litellm_inference.py - Python API examples Removed Files: - Old vLLM-specific files (vllm_infer.py, vllm configs, vllm guide) - No longer needed with API-based approach Benefits: - US-based models for security and compliance - No local GPU/CUDA requirements for inference - Production-ready APIs from Google and NVIDIA - Cost-effective (especially Gemini Flash) - Easier deployment and scaling - Regular model updates from providers Supported Use Cases: - Land use classification - Building and infrastructure detection - Environmental monitoring - Change detection analysis - General overhead imagery description API Key Setup: export GEMINI_API_KEY='your-key' # Get from https://makersuite.google.com/app/apikey export NVIDIA_API_KEY='your-key' # Get from https://build.nvidia.com/ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

claude added 2 commits October 21, 2025 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM multimodal vision-language model support to Solaris#473

Add vLLM multimodal vision-language model support to Solaris#473
dlindenbaum wants to merge 2 commits intomainfrom
claude/add-vllm-multimodal-011CUKYCCmN4ZF26DohprzXZ

dlindenbaum commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dlindenbaum commented Oct 21, 2025

Description

Type of change

How Has This Been Tested?

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants