|
| 1 | +# VLM Tutorial - Car Damage Verification |
| 2 | + |
| 3 | +This Vision Language Model (VLM) tutorial features an interactive car verification challenge powered by OVHcloud AI Endpoints. |
| 4 | + |
| 5 | +## Files |
| 6 | + |
| 7 | +- `test_vision_connection.py` - Test OVHcloud Vision API connectivity |
| 8 | +- `verification_demo.py` - Core car verification logic using VLM |
| 9 | +- `verification_app.py` - Interactive Chainlit web application |
| 10 | +- `chainlit.md` - Welcome page content for the Chainlit app |
| 11 | +- `requirements.txt` - Python dependencies |
| 12 | + |
| 13 | +## What This Demo Does |
| 14 | + |
| 15 | +The **Car Verification Challenge** is an interactive AI-powered fact-checking experiment where users: |
| 16 | + |
| 17 | +1. **Make claims** about their car (manufacturer, model, color, damage) |
| 18 | +2. **Upload photos** of the actual vehicle (minimum 3 photos) |
| 19 | +3. **Get AI analysis** that verifies if the photos match their claims |
| 20 | +4. **Receive a verdict** - did you tell the truth or try to trick the AI? |
| 21 | + |
| 22 | +This demonstrates how Vision Language Models can analyze visual content and cross-reference it with textual claims for verification tasks. |
| 23 | + |
| 24 | +## Usage |
| 25 | + |
| 26 | +### 1. Prerequisites |
| 27 | + |
| 28 | +Ensure you have Python 3.8+ installed and access to OVHcloud AI Endpoints. |
| 29 | + |
| 30 | +### 2. Environment Setup |
| 31 | + |
| 32 | +Create a `.env` file with your OVHcloud credentials: |
| 33 | +``` |
| 34 | +OVH_AI_ENDPOINTS_ACCESS_TOKEN=your_token_here |
| 35 | +QWEN_URL=https://qwen-2-5-vl-72b-instruct.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/chat/completions |
| 36 | +``` |
| 37 | + |
| 38 | +### 3. Install Dependencies |
| 39 | + |
| 40 | +```bash |
| 41 | +pip install -r requirements.txt |
| 42 | +``` |
| 43 | + |
| 44 | +### 4. Test Connection |
| 45 | + |
| 46 | +First, verify your API connectivity: |
| 47 | +```bash |
| 48 | +python test_vision_connection.py |
| 49 | +``` |
| 50 | + |
| 51 | +### 5. Run the Interactive App |
| 52 | + |
| 53 | +Launch the Chainlit web application: |
| 54 | +```bash |
| 55 | +chainlit run verification_app.py |
| 56 | +``` |
| 57 | + |
| 58 | +Then open your browser to the provided URL (typically `http://localhost:8000`). |
| 59 | + |
| 60 | +### 6. Test the Core Logic |
| 61 | + |
| 62 | +You can also test the verification engine directly: |
| 63 | +```bash |
| 64 | +python verification_demo.py |
| 65 | +``` |
| 66 | + |
| 67 | +## How It Works |
| 68 | + |
| 69 | +### Vision Analysis Pipeline |
| 70 | + |
| 71 | +1. **Image Processing**: Photos are optimized and converted to base64 for API transmission |
| 72 | +2. **Multi-Modal Prompting**: The VLM receives both user claims (text) and photos (images) |
| 73 | +3. **Verification Analysis**: AI analyzes each claim against visual evidence: |
| 74 | + - Manufacturer/brand identification |
| 75 | + - Model recognition |
| 76 | + - Color verification |
| 77 | + - Damage assessment |
| 78 | +4. **Report Generation**: Detailed verification report with confidence levels and visual indicators |
| 79 | + |
| 80 | +### Key Features |
| 81 | + |
| 82 | +- **Multi-image analysis** - Processes up to 3 photos simultaneously |
| 83 | +- **Structured verification** - Systematically checks each claim type |
| 84 | +- **Enhanced formatting** - Green checkmarks (✅) for matches, red crosses (❌) for mismatches |
| 85 | +- **Interactive interface** - User-friendly web application |
| 86 | +- **Real-time processing** - Live verification with visual feedback |
| 87 | + |
| 88 | +## Model Information |
| 89 | + |
| 90 | +- **Vision Model**: Qwen2.5-VL-72B-Instruct |
| 91 | +- **Provider**: OVHcloud AI Endpoints |
| 92 | +- **Capabilities**: Multi-modal understanding, object detection, text-image reasoning |
| 93 | +- **Optimizations**: Image compression, quality balancing for performance |
| 94 | + |
| 95 | +## Requirements |
| 96 | + |
| 97 | +See `requirements.txt` for all dependencies: |
| 98 | +- `chainlit` - Interactive web interface |
| 99 | +- `pillow` - Image processing |
| 100 | +- `requests` - API communication |
| 101 | +- `python-dotenv` - Environment management |
| 102 | +- `aiofiles` - Async file operations |
| 103 | + |
| 104 | +## Educational Value |
| 105 | + |
| 106 | +This tutorial demonstrates: |
| 107 | +- **Multi-modal AI applications** - Combining text and image analysis |
| 108 | +- **Verification systems** - Using AI for fact-checking |
| 109 | +- **Interactive AI interfaces** - Building engaging user experiences |
| 110 | +- **Vision model integration** - Practical VLM implementation |
| 111 | +- **Real-world applications** - Insurance, automotive, verification use cases |
| 112 | + |
| 113 | +## Potential Extensions |
| 114 | + |
| 115 | +- **Damage severity scoring** - Quantify damage levels |
| 116 | +- **Multiple vehicle verification** - Compare multiple cars |
| 117 | +- **Historical comparison** - Before/after damage analysis |
| 118 | +- **Integration with databases** - Verify against vehicle registrations |
| 119 | +- **Mobile app version** - Native mobile implementation |
| 120 | + |
| 121 | +## Troubleshooting |
| 122 | + |
| 123 | +- **Connection issues**: Run `test_vision_connection.py` to verify API access |
| 124 | +- **Image upload problems**: Check file formats (PNG, JPG, JPEG, WebP supported) |
| 125 | +- **Slow performance**: Reduce image sizes or number of photos |
| 126 | +- **Token errors**: Verify your OVHcloud AI Endpoints token in `.env` |
| 127 | +- **Formatting issues**: The enhanced formatting automatically adds checkmarks and structure |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +*Powered by OVHcloud AI Endpoints - Demonstrating the power of Vision Language Models for real-world verification tasks.* |
0 commit comments