Add VLM car damage verification demo (#103)

cougz · web-flow · commit 37593e94d3b4 · 2025-06-12T09:05:17.000+02:00
* Add VLM car damage verification demo

This demo showcases Vision Language Model capabilities for car damage
verification using OVHcloud AI Endpoints. Features include:

- Interactive Chainlit web application
- Multi-image analysis with Qwen2.5-VL-72B-Instruct
- Car claim verification (manufacturer, model, color, damage)
- Refactored code with ~30% complexity reduction
- Comprehensive documentation and setup guide

Signed-off-by: Tim Seiffert &lt;Tim.Seiffert@ovhcloud.com&gt;

* Update README.md

Change in prerequisites

* Update README.md

Added new VLM demo to README

---------

Signed-off-by: Tim Seiffert &lt;Tim.Seiffert@ovhcloud.com&gt;
diff --git a/ai/ai-endpoints/README.md b/ai/ai-endpoints/README.md
@@ -30,7 +30,8 @@ Don't hesitate to use the source code and give us feedback.
   - [Conversational Memory for chatbot](./python-langchain-conversational-memory/) by using Mistral7B and LangChain Memory module
   - [Video Translator](./speech-ai-video-translator) with ASR, NMT and TTS to subtitle and dub video voices
   - [ASR features](./asr-features) to better understand how Automatic Speech Recognition models work
-  - [TTS features](./tts-features) to be able to use all Text To Speech models easily 
+  - [TTS features](./tts-features) to be able to use all Text To Speech models easily
+  - [Car Damage Verification with VLM](./car-damage-verification-using-vlm/) - Interactive fact-checking using Vision Language Models to verify car claims against photos
 
 ### 🕸️ Javascript 🕸️
 
diff --git a/ai/ai-endpoints/car-damage-verification-using-vlm/README.md b/ai/ai-endpoints/car-damage-verification-using-vlm/README.md
@@ -0,0 +1,131 @@
+# VLM Tutorial - Car Damage Verification
+
+This Vision Language Model (VLM) tutorial features an interactive car verification challenge powered by OVHcloud AI Endpoints.
+
+## Files
+
+- `test_vision_connection.py` - Test OVHcloud Vision API connectivity
+- `verification_demo.py` - Core car verification logic using VLM
+- `verification_app.py` - Interactive Chainlit web application
+- `chainlit.md` - Welcome page content for the Chainlit app
+- `requirements.txt` - Python dependencies
+
+## What This Demo Does
+
+The **Car Verification Challenge** is an interactive AI-powered fact-checking experiment where users:
+
+1. **Make claims** about their car (manufacturer, model, color, damage)
+2. **Upload photos** of the actual vehicle (minimum 3 photos)
+3. **Get AI analysis** that verifies if the photos match their claims
+4. **Receive a verdict** - did you tell the truth or try to trick the AI?
+
+This demonstrates how Vision Language Models can analyze visual content and cross-reference it with textual claims for verification tasks.
+
+## Usage
+
+### 1. Prerequisites
+
+Ensure you have Python 3.8+ installed and access to OVHcloud AI Endpoints.
+
+### 2. Environment Setup
+
+Create a `.env` file with your OVHcloud credentials:
+```
+OVH_AI_ENDPOINTS_ACCESS_TOKEN=your_token_here
+QWEN_URL=https://qwen-2-5-vl-72b-instruct.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/chat/completions
+```
+
+### 3. Install Dependencies
+
+```bash
+pip install -r requirements.txt
+```
+
+### 4. Test Connection
+
+First, verify your API connectivity:
+```bash
+python test_vision_connection.py
+```
+
+### 5. Run the Interactive App
+
+Launch the Chainlit web application:
+```bash
+chainlit run verification_app.py
+```
+
+Then open your browser to the provided URL (typically `http://localhost:8000`).
+
+### 6. Test the Core Logic
+
+You can also test the verification engine directly:
+```bash
+python verification_demo.py
+```
+
+## How It Works
+
+### Vision Analysis Pipeline
+
+1. **Image Processing**: Photos are optimized and converted to base64 for API transmission
+2. **Multi-Modal Prompting**: The VLM receives both user claims (text) and photos (images)
+3. **Verification Analysis**: AI analyzes each claim against visual evidence:
+   - Manufacturer/brand identification
+   - Model recognition
+   - Color verification
+   - Damage assessment
+4. **Report Generation**: Detailed verification report with confidence levels and visual indicators
+
+### Key Features
+
+- **Multi-image analysis** - Processes up to 3 photos simultaneously
+- **Structured verification** - Systematically checks each claim type
+- **Enhanced formatting** - Green checkmarks (✅) for matches, red crosses (❌) for mismatches
+- **Interactive interface** - User-friendly web application
+- **Real-time processing** - Live verification with visual feedback
+
+## Model Information
+
+- **Vision Model**: Qwen2.5-VL-72B-Instruct
+- **Provider**: OVHcloud AI Endpoints
+- **Capabilities**: Multi-modal understanding, object detection, text-image reasoning
+- **Optimizations**: Image compression, quality balancing for performance
+
+## Requirements
+
+See `requirements.txt` for all dependencies:
+- `chainlit` - Interactive web interface
+- `pillow` - Image processing
+- `requests` - API communication
+- `python-dotenv` - Environment management
+- `aiofiles` - Async file operations
+
+## Educational Value
+
+This tutorial demonstrates:
+- **Multi-modal AI applications** - Combining text and image analysis
+- **Verification systems** - Using AI for fact-checking
+- **Interactive AI interfaces** - Building engaging user experiences
+- **Vision model integration** - Practical VLM implementation
+- **Real-world applications** - Insurance, automotive, verification use cases
+
+## Potential Extensions
+
+- **Damage severity scoring** - Quantify damage levels
+- **Multiple vehicle verification** - Compare multiple cars
+- **Historical comparison** - Before/after damage analysis
+- **Integration with databases** - Verify against vehicle registrations
+- **Mobile app version** - Native mobile implementation
+
+## Troubleshooting
+
+- **Connection issues**: Run `test_vision_connection.py` to verify API access
+- **Image upload problems**: Check file formats (PNG, JPG, JPEG, WebP supported)
+- **Slow performance**: Reduce image sizes or number of photos
+- **Token errors**: Verify your OVHcloud AI Endpoints token in `.env`
+- **Formatting issues**: The enhanced formatting automatically adds checkmarks and structure
+
+---
+
+*Powered by OVHcloud AI Endpoints - Demonstrating the power of Vision Language Models for real-world verification tasks.*
diff --git a/ai/ai-endpoints/car-damage-verification-using-vlm/chainlit.md b/ai/ai-endpoints/car-damage-verification-using-vlm/chainlit.md
@@ -0,0 +1,17 @@
+# 🕵️ Car Verification Challenge
+
+Welcome to the ultimate AI fact-checking experiment! 
+
+**The Challenge:** Tell me about your car, then upload photos. Let's see if the AI can catch you if you're not being truthful!
+
+## How it works:
+1. **You tell me** your car's details
+2. **Upload 3 photos** of the actual vehicle
+3. **AI analyzes** what it really sees 
+4. **Get verdict** - do your claims match reality?
+
+## Ready to start?
+The AI will ask for your car details step by step. Feel free to tell the truth... or try to trick the AI! 😉
+
+---
+*Powered by OVHcloud AI Endpoints - Qwen2.5-VL-72B-Instruct*
diff --git a/ai/ai-endpoints/car-damage-verification-using-vlm/requirements.txt b/ai/ai-endpoints/car-damage-verification-using-vlm/requirements.txt
@@ -0,0 +1,9 @@
+chainlit>=1.0.0
+pillow>=10.0.0
+requests>=2.31.0
+python-dotenv>=1.0.0
+aiofiles>=23.0.0
+typing-extensions>=4.5.0
+pydantic>=2.0.0
+fastapi>=0.100.0
+uvicorn>=0.23.0
diff --git a/ai/ai-endpoints/car-damage-verification-using-vlm/test_vision_connection.py b/ai/ai-endpoints/car-damage-verification-using-vlm/test_vision_connection.py
@@ -0,0 +1,108 @@
+# Copyright (c) 2025 OVHcloud
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import base64
+import requests
+from PIL import Image, ImageDraw
+import io
+from dotenv import load_dotenv
+
+load_dotenv()
+
+def create_test_image():
+    """Create a simple test image for API testing"""
+    # Create a 200x150 test image
+    img = Image.new('RGB', (200, 150), color='lightblue')
+    draw = ImageDraw.Draw(img)
+    
+    # Draw a simple car shape
+    draw.rectangle([50, 60, 150, 110], fill='red', outline='black', width=2)
+    draw.ellipse([60, 100, 80, 120], fill='black')  # Left wheel
+    draw.ellipse([120, 100, 140, 120], fill='black')  # Right wheel
+    draw.text((10, 10), "TEST CAR", fill='black')
+    
+    # Convert to base64
+    buffer = io.BytesIO()
+    img.save(buffer, format='PNG')
+    buffer.seek(0)
+    
+    return base64.b64encode(buffer.getvalue()).decode('utf-8')
+
+def test_vision_api():
+    """Test OVHcloud Vision API connectivity"""
+    token = os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN")
+    url = os.getenv("QWEN_URL")
+    model = "Qwen2.5-VL-72B-Instruct"  # Fixed model for demo
+    
+    if not token:
+        print("❌ No token found. Check your .env file.")
+        return False
+    
+    # Create test image
+    test_image_b64 = create_test_image()
+    
+    headers = {
+        "Authorization": f"Bearer {token}",
+        "Content-Type": "application/json"
+    }
+    
+    payload = {
+        "model": model,
+        "messages": [
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "text",
+                        "text": "What do you see in this image? Describe it briefly."
+                    },
+                    {
+                        "type": "image_url",
+                        "image_url": {
+                            "url": f"data:image/png;base64,{test_image_b64}"
+                        }
+                    }
+                ]
+            }
+        ],
+        "max_tokens": 100,
+        "temperature": 0.1
+    }
+    
+    try:
+        print("🔍 Testing OVHcloud Vision API...")
+        response = requests.post(url, json=payload, headers=headers, timeout=30)
+        
+        if response.status_code == 200:
+            result = response.json()
+            ai_response = result['choices'][0]['message']['content']
+            print(f"✅ Vision API works!")
+            print(f"🤖 AI Response: {ai_response}")
+            return True
+        else:
+            print(f"❌ API failed: {response.status_code}")
+            print(f"Response: {response.text}")
+            return False
+            
+    except Exception as e:
+        print(f"❌ Connection error: {e}")
+        return False
+
+if __name__ == "__main__":
+    print("Testing OVHcloud Vision API connectivity...\n")
+    
+    if test_vision_api():
+        print("\n🎉 Vision API is working! Ready for demo testing.")
+    else:
+        print("\n⚠️ Vision API test failed. Check your token and try again.")
diff --git a/ai/ai-endpoints/car-damage-verification-using-vlm/verification_app.py b/ai/ai-endpoints/car-damage-verification-using-vlm/verification_app.py
diff --git a/ai/ai-endpoints/car-damage-verification-using-vlm/verification_demo.py b/ai/ai-endpoints/car-damage-verification-using-vlm/verification_demo.py