Skip to content

Commit 37593e9

Browse files
authored
Add VLM car damage verification demo (#103)
* Add VLM car damage verification demo This demo showcases Vision Language Model capabilities for car damage verification using OVHcloud AI Endpoints. Features include: - Interactive Chainlit web application - Multi-image analysis with Qwen2.5-VL-72B-Instruct - Car claim verification (manufacturer, model, color, damage) - Refactored code with ~30% complexity reduction - Comprehensive documentation and setup guide Signed-off-by: Tim Seiffert <[email protected]> * Update README.md Change in prerequisites * Update README.md Added new VLM demo to README --------- Signed-off-by: Tim Seiffert <[email protected]>
1 parent 2d6bb5c commit 37593e9

File tree

7 files changed

+729
-1
lines changed

7 files changed

+729
-1
lines changed

ai/ai-endpoints/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ Don't hesitate to use the source code and give us feedback.
3030
- [Conversational Memory for chatbot](./python-langchain-conversational-memory/) by using Mistral7B and LangChain Memory module
3131
- [Video Translator](./speech-ai-video-translator) with ASR, NMT and TTS to subtitle and dub video voices
3232
- [ASR features](./asr-features) to better understand how Automatic Speech Recognition models work
33-
- [TTS features](./tts-features) to be able to use all Text To Speech models easily
33+
- [TTS features](./tts-features) to be able to use all Text To Speech models easily
34+
- [Car Damage Verification with VLM](./car-damage-verification-using-vlm/) - Interactive fact-checking using Vision Language Models to verify car claims against photos
3435

3536
### 🕸️ Javascript 🕸️
3637

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# VLM Tutorial - Car Damage Verification
2+
3+
This Vision Language Model (VLM) tutorial features an interactive car verification challenge powered by OVHcloud AI Endpoints.
4+
5+
## Files
6+
7+
- `test_vision_connection.py` - Test OVHcloud Vision API connectivity
8+
- `verification_demo.py` - Core car verification logic using VLM
9+
- `verification_app.py` - Interactive Chainlit web application
10+
- `chainlit.md` - Welcome page content for the Chainlit app
11+
- `requirements.txt` - Python dependencies
12+
13+
## What This Demo Does
14+
15+
The **Car Verification Challenge** is an interactive AI-powered fact-checking experiment where users:
16+
17+
1. **Make claims** about their car (manufacturer, model, color, damage)
18+
2. **Upload photos** of the actual vehicle (minimum 3 photos)
19+
3. **Get AI analysis** that verifies if the photos match their claims
20+
4. **Receive a verdict** - did you tell the truth or try to trick the AI?
21+
22+
This demonstrates how Vision Language Models can analyze visual content and cross-reference it with textual claims for verification tasks.
23+
24+
## Usage
25+
26+
### 1. Prerequisites
27+
28+
Ensure you have Python 3.8+ installed and access to OVHcloud AI Endpoints.
29+
30+
### 2. Environment Setup
31+
32+
Create a `.env` file with your OVHcloud credentials:
33+
```
34+
OVH_AI_ENDPOINTS_ACCESS_TOKEN=your_token_here
35+
QWEN_URL=https://qwen-2-5-vl-72b-instruct.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/chat/completions
36+
```
37+
38+
### 3. Install Dependencies
39+
40+
```bash
41+
pip install -r requirements.txt
42+
```
43+
44+
### 4. Test Connection
45+
46+
First, verify your API connectivity:
47+
```bash
48+
python test_vision_connection.py
49+
```
50+
51+
### 5. Run the Interactive App
52+
53+
Launch the Chainlit web application:
54+
```bash
55+
chainlit run verification_app.py
56+
```
57+
58+
Then open your browser to the provided URL (typically `http://localhost:8000`).
59+
60+
### 6. Test the Core Logic
61+
62+
You can also test the verification engine directly:
63+
```bash
64+
python verification_demo.py
65+
```
66+
67+
## How It Works
68+
69+
### Vision Analysis Pipeline
70+
71+
1. **Image Processing**: Photos are optimized and converted to base64 for API transmission
72+
2. **Multi-Modal Prompting**: The VLM receives both user claims (text) and photos (images)
73+
3. **Verification Analysis**: AI analyzes each claim against visual evidence:
74+
- Manufacturer/brand identification
75+
- Model recognition
76+
- Color verification
77+
- Damage assessment
78+
4. **Report Generation**: Detailed verification report with confidence levels and visual indicators
79+
80+
### Key Features
81+
82+
- **Multi-image analysis** - Processes up to 3 photos simultaneously
83+
- **Structured verification** - Systematically checks each claim type
84+
- **Enhanced formatting** - Green checkmarks (✅) for matches, red crosses (❌) for mismatches
85+
- **Interactive interface** - User-friendly web application
86+
- **Real-time processing** - Live verification with visual feedback
87+
88+
## Model Information
89+
90+
- **Vision Model**: Qwen2.5-VL-72B-Instruct
91+
- **Provider**: OVHcloud AI Endpoints
92+
- **Capabilities**: Multi-modal understanding, object detection, text-image reasoning
93+
- **Optimizations**: Image compression, quality balancing for performance
94+
95+
## Requirements
96+
97+
See `requirements.txt` for all dependencies:
98+
- `chainlit` - Interactive web interface
99+
- `pillow` - Image processing
100+
- `requests` - API communication
101+
- `python-dotenv` - Environment management
102+
- `aiofiles` - Async file operations
103+
104+
## Educational Value
105+
106+
This tutorial demonstrates:
107+
- **Multi-modal AI applications** - Combining text and image analysis
108+
- **Verification systems** - Using AI for fact-checking
109+
- **Interactive AI interfaces** - Building engaging user experiences
110+
- **Vision model integration** - Practical VLM implementation
111+
- **Real-world applications** - Insurance, automotive, verification use cases
112+
113+
## Potential Extensions
114+
115+
- **Damage severity scoring** - Quantify damage levels
116+
- **Multiple vehicle verification** - Compare multiple cars
117+
- **Historical comparison** - Before/after damage analysis
118+
- **Integration with databases** - Verify against vehicle registrations
119+
- **Mobile app version** - Native mobile implementation
120+
121+
## Troubleshooting
122+
123+
- **Connection issues**: Run `test_vision_connection.py` to verify API access
124+
- **Image upload problems**: Check file formats (PNG, JPG, JPEG, WebP supported)
125+
- **Slow performance**: Reduce image sizes or number of photos
126+
- **Token errors**: Verify your OVHcloud AI Endpoints token in `.env`
127+
- **Formatting issues**: The enhanced formatting automatically adds checkmarks and structure
128+
129+
---
130+
131+
*Powered by OVHcloud AI Endpoints - Demonstrating the power of Vision Language Models for real-world verification tasks.*
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# 🕵️ Car Verification Challenge
2+
3+
Welcome to the ultimate AI fact-checking experiment!
4+
5+
**The Challenge:** Tell me about your car, then upload photos. Let's see if the AI can catch you if you're not being truthful!
6+
7+
## How it works:
8+
1. **You tell me** your car's details
9+
2. **Upload 3 photos** of the actual vehicle
10+
3. **AI analyzes** what it really sees
11+
4. **Get verdict** - do your claims match reality?
12+
13+
## Ready to start?
14+
The AI will ask for your car details step by step. Feel free to tell the truth... or try to trick the AI! 😉
15+
16+
---
17+
*Powered by OVHcloud AI Endpoints - Qwen2.5-VL-72B-Instruct*
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
chainlit>=1.0.0
2+
pillow>=10.0.0
3+
requests>=2.31.0
4+
python-dotenv>=1.0.0
5+
aiofiles>=23.0.0
6+
typing-extensions>=4.5.0
7+
pydantic>=2.0.0
8+
fastapi>=0.100.0
9+
uvicorn>=0.23.0
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Copyright (c) 2025 OVHcloud
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
14+
import os
15+
import base64
16+
import requests
17+
from PIL import Image, ImageDraw
18+
import io
19+
from dotenv import load_dotenv
20+
21+
load_dotenv()
22+
23+
def create_test_image():
24+
"""Create a simple test image for API testing"""
25+
# Create a 200x150 test image
26+
img = Image.new('RGB', (200, 150), color='lightblue')
27+
draw = ImageDraw.Draw(img)
28+
29+
# Draw a simple car shape
30+
draw.rectangle([50, 60, 150, 110], fill='red', outline='black', width=2)
31+
draw.ellipse([60, 100, 80, 120], fill='black') # Left wheel
32+
draw.ellipse([120, 100, 140, 120], fill='black') # Right wheel
33+
draw.text((10, 10), "TEST CAR", fill='black')
34+
35+
# Convert to base64
36+
buffer = io.BytesIO()
37+
img.save(buffer, format='PNG')
38+
buffer.seek(0)
39+
40+
return base64.b64encode(buffer.getvalue()).decode('utf-8')
41+
42+
def test_vision_api():
43+
"""Test OVHcloud Vision API connectivity"""
44+
token = os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN")
45+
url = os.getenv("QWEN_URL")
46+
model = "Qwen2.5-VL-72B-Instruct" # Fixed model for demo
47+
48+
if not token:
49+
print("❌ No token found. Check your .env file.")
50+
return False
51+
52+
# Create test image
53+
test_image_b64 = create_test_image()
54+
55+
headers = {
56+
"Authorization": f"Bearer {token}",
57+
"Content-Type": "application/json"
58+
}
59+
60+
payload = {
61+
"model": model,
62+
"messages": [
63+
{
64+
"role": "user",
65+
"content": [
66+
{
67+
"type": "text",
68+
"text": "What do you see in this image? Describe it briefly."
69+
},
70+
{
71+
"type": "image_url",
72+
"image_url": {
73+
"url": f"data:image/png;base64,{test_image_b64}"
74+
}
75+
}
76+
]
77+
}
78+
],
79+
"max_tokens": 100,
80+
"temperature": 0.1
81+
}
82+
83+
try:
84+
print("🔍 Testing OVHcloud Vision API...")
85+
response = requests.post(url, json=payload, headers=headers, timeout=30)
86+
87+
if response.status_code == 200:
88+
result = response.json()
89+
ai_response = result['choices'][0]['message']['content']
90+
print(f"✅ Vision API works!")
91+
print(f"🤖 AI Response: {ai_response}")
92+
return True
93+
else:
94+
print(f"❌ API failed: {response.status_code}")
95+
print(f"Response: {response.text}")
96+
return False
97+
98+
except Exception as e:
99+
print(f"❌ Connection error: {e}")
100+
return False
101+
102+
if __name__ == "__main__":
103+
print("Testing OVHcloud Vision API connectivity...\n")
104+
105+
if test_vision_api():
106+
print("\n🎉 Vision API is working! Ready for demo testing.")
107+
else:
108+
print("\n⚠️ Vision API test failed. Check your token and try again.")

0 commit comments

Comments
 (0)