Skip to content

Commit 9739ecc

Browse files
authored
Merge pull request #327 from open-edge-platform/update-branch
feat: add vlm-video-summarization-and-interactive-chat demo usecase (#866)
2 parents 3d52e4d + 5fc1149 commit 9739ecc

File tree

15 files changed

+2016
-4
lines changed

15 files changed

+2016
-4
lines changed

docs/use-cases.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Transform your Intel® hardware into powerful AI and edge computing solutions wi
2020
| [**🚗 Smart Parking**](../usecases/ai/smart-parking/README.md) | IoT-powered parking management | ⏱️ 45 min | IoT integration, real-time analytics |
2121
| [**📹 Video Summarization**](../usecases/ai/video_summarization/README.md) | AI-powered video insights | ⏱️ 60 min | Multi-modal AI, content analysis |
2222
| [**📄 Visual Text Driven Document Reasoning Engine**](../usecases/ai/visual-text-driven-document-reasoning-engine/README.md) | Document Search & Retrieval Engine | ⏱️ 30 min | Document embedding, VLM |
23+
| [**🎬 VLM Video Summarization and Interactive Chat**](../usecases/ai/vlm-video-summarization-and-interactive-chat/README.md) | AI video analysis with interactive chat | ⏱️ 30 min | VLM, vector embeddings, semantic search |
2324

2425
### 🔴 **Advanced** _(Experienced Developers)_
2526

@@ -35,12 +36,12 @@ Transform your Intel® hardware into powerful AI and edge computing solutions wi
3536

3637
- **Language Models:** [OpenWebUI + Ollama](../usecases/ai/openwebui-ollama/README.md), [RAG Toolkit](../usecases/ai/rag-toolkit/README.md)
3738
- **Computer Vision:** [AI Video Analytics](../usecases/ai/ai-video-analytics/README.md), [Smart Parking](../usecases/ai/smart-parking/README.md)
38-
- **Multi-Modal:** [Edge AI Demo Studio (Digital Avatar)](../usecases/ai/edge-ai-demo-studio/README.md), [Video Summarization](../usecases/ai/video_summarization/README.md)
39+
- **Multi-Modal:** [Edge AI Demo Studio (Digital Avatar)](../usecases/ai/edge-ai-demo-studio/README.md), [Video Summarization](../usecases/ai/video_summarization/README.md), [VLM Video Summarization and Interactive Chat](../usecases/ai/vlm-video-summarization-and-interactive-chat/README.md)
3940
- **Document Embedding:** [Visual Text Driven Document Reasoning Engine](../usecases/ai/visual-text-driven-document-reasoning-engine/README.md)
4041

4142
### 📹 **Media & Content**
4243

43-
- **Video Processing:** [Video Analytics](../usecases/ai/ai-video-analytics/README.md), [Video Summarization](../usecases/ai/video_summarization/README.md)
44+
- **Video Processing:** [Video Analytics](../usecases/ai/ai-video-analytics/README.md), [Video Summarization](../usecases/ai/video_summarization/README.md), [VLM Video Summarization and Interactive Chat](../usecases/ai/vlm-video-summarization-and-interactive-chat/README.md)
4445
- **Camera Systems:** [GMSL Cameras](../usecases/camera/gmsl/README.md), [MIPI Cameras](../usecases/camera/mipi/README.md)
4546
- **Interactive Media:** [Edge AI Demo Studio (Digital Avatar)](../usecases/ai/edge-ai-demo-studio/README.md)
4647

@@ -67,8 +68,9 @@ Transform your Intel® hardware into powerful AI and edge computing solutions wi
6768

6869
1. Deploy [RAG Toolkit](../usecases/ai/rag-toolkit/README.md) - Enterprise AI patterns
6970
2. Explore [Visual Text Driven Document Reasoning Engine](../usecases/ai/visual-text-driven-document-reasoning-engine/README.md) - Document embedding models & VLMs
70-
3. Implement [Smart Parking](../usecases/ai/smart-parking/README.md) - IoT + AI integration
71-
4. Master [Real-Time Computing](../usecases/real-time/tcc_tutorial/README.md) - Performance optimization
71+
3. Try [VLM Video Summarization and Interactive Chat](../usecases/ai/vlm-video-summarization-and-interactive-chat/README.md) - Advanced video analysis with VLM
72+
4. Implement [Smart Parking](../usecases/ai/smart-parking/README.md) - IoT + AI integration
73+
5. Master [Real-Time Computing](../usecases/real-time/tcc_tutorial/README.md) - Performance optimization
7274

7375
### **🔬 For Researchers**
7476

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
HOST_IP=127.0.0.1
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
venv/
2+
llama*
3+
chroma_db/
4+
__pycache__/
Lines changed: 313 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,313 @@
1+
# AI Video Summarization & Interactive Chat
2+
3+
This application provides AI-powered video summarization to generate concise summaries of key events and enables real-time interaction and queries via a chatbot interface.
4+
5+
## Features
6+
7+
- **AI Video Summarization:** Automatically extract and summarize key events from video streams using OpenCV for frame analysis and Vision-Language Models (VLM) for semantic understanding. Supports generating concise textual summaries and highlights for efficient review.
8+
9+
- **AI Chatbot:** Engage in real-time conversations, ask questions about video content, and receive instant insights through an interactive Gradio interface.
10+
11+
- **Embedding Storage with ChromaDB:** Store and manage vector embeddings efficiently using ChromaDB, enabling fast semantic search and retrieval for downstream analytics and querying.
12+
13+
## Requirements
14+
15+
### Validated Hardware
16+
- CPU: 13th Gen Intel(R) Core(TM) i9-13900K
17+
- GPU: Intel® Arc™ Pro B-Series Graphics
18+
- RAM: 32GB
19+
- Disk: 256GB
20+
21+
### Application Ports
22+
| Service | Port | Use |
23+
|----------------------------------|-------|---------------------------------------|
24+
| Main Application | 5999 | Gradio web interface |
25+
| Qwen2.5-VL-7B-Instruct | 5776 | Vision-Language Model server |
26+
| Qwen3-8B | 5778 | Text generation model server |
27+
| Fastapi | 5777 | API backend service and MCP server |
28+
29+
## Prerequisites
30+
Before proceeding with the installation, ensure the following system requirements are met:
31+
32+
- A compatible operating system (Ubuntu 24.04 or Windows 11) must be installed and running.
33+
- Intel GPU driver must be installed and properly configured on the system.
34+
35+
## Quick Start
36+
37+
### Windows
38+
Run the provided PowerShell script to start the servers and application:
39+
```powershell
40+
.\run_app.ps1
41+
```
42+
43+
Alternatively, you can use the batch script:
44+
```batch
45+
.\run_app.bat
46+
```
47+
48+
Once running, open [http://localhost:5999](http://localhost:5999) in your browser.
49+
50+
### Linux
51+
52+
Before installing Python dependencies, ensure you have Python and FFmpeg installed:
53+
54+
```bash
55+
sudo apt update
56+
sudo apt install python3 python3-pip python3-venv ffmpeg
57+
```
58+
59+
Run the provided bash script to start the servers and application:
60+
```bash
61+
./run_app.sh
62+
```
63+
64+
Once running, open [http://localhost:5999](http://localhost:5999) in your browser.
65+
66+
## Manual Setup Instructions
67+
68+
Choose the appropriate setup method for your operating system:
69+
70+
### Windows Setup
71+
72+
#### 1. Install Python Dependencies
73+
74+
Make sure you have Python 3.8 and higher installed. Then, install the required Python packages.
75+
76+
```powershell
77+
python -m venv venv
78+
.\venv\Scripts\activate
79+
pip install -r requirements.txt
80+
```
81+
82+
#### 2. Use Pre-compiled Llama.cpp Binaries
83+
84+
Download the pre-compiled Windows binaries for llama.cpp with Vulkan or SYCL support from the [llama.cpp b7223 release page](https://github.com/ggml-org/llama.cpp/releases/tag/b7223). Place the extracted `llama-b7223-bin-win-vulkan-x64.zip` folder in your project directory.
85+
86+
#### 3. Start Llama Servers
87+
88+
Start the Qwen3-8B server (port 5778) as required. The Qwen2.5-VL server (port 5776) is optional if you have previously run it and already have the database.
89+
90+
**Qwen3-8B (port 5778):**
91+
92+
```powershell
93+
.\llama-b7223-bin-win-vulkan-x64\llama-server.exe -hf unsloth/Qwen3-8B-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5778 --jinja
94+
```
95+
96+
**Qwen2.5-VL-7B-Instruct (port 5776, optional):**
97+
98+
```powershell
99+
.\llama-b7223-bin-win-vulkan-x64\llama-server.exe -hf unsloth/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5776 --jinja
100+
```
101+
102+
#### 4. Run the Gradio Application
103+
104+
```powershell
105+
python app.py
106+
```
107+
108+
### Linux Setup
109+
110+
#### 1. Install Python Dependencies
111+
112+
Make sure you have Python 3.8 and higher installed. Then, install the required Python packages.
113+
114+
```bash
115+
python3 -m venv venv
116+
source venv/bin/activate
117+
pip install -r requirements.txt
118+
```
119+
120+
#### 2. Prepare llama.cpp
121+
122+
You can either **compile llama.cpp with SYCL backend** or **use the precompiled Vulkan binary**:
123+
124+
**Option A: Compile llama.cpp with SYCL backend**
125+
126+
Follow [SYCL backend instructions](https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/SYCL.md):
127+
128+
1. Install oneAPI Base Toolkit ([download link](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?packages=dl-essentials&dl-essentials-os=linux&dl-lin=apt)):
129+
130+
```bash
131+
sudo apt update
132+
sudo apt install -y gpg-agent wget
133+
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \
134+
| gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
135+
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
136+
sudo apt update
137+
sudo apt install intel-deep-learning-essentials
138+
```
139+
140+
2. Set up environment:
141+
142+
```bash
143+
source /opt/intel/oneapi/<oneapi-version>/oneapi-vars.sh
144+
```
145+
146+
> **Note:** To verify SYCL installation, run:
147+
> ```bash
148+
> sycl-ls
149+
> ```
150+
151+
3. Build llama.cpp:
152+
153+
```bash
154+
git clone https://github.com/ggml-org/llama.cpp
155+
cd llama.cpp
156+
sed -i 's/-DLLAMA_CURL=OFF/-DLLAMA_CURL=ON/g' ./examples/sycl/build.sh
157+
sudo apt install curl libcurl4-openssl-dev cmake build-essential
158+
./examples/sycl/build.sh
159+
```
160+
161+
**Option B: Use Precompiled Vulkan Binary**
162+
163+
Download the precompiled Vulkan binary for Linux from the [llama.cpp b7223 release page](https://github.com/ggml-org/llama.cpp/releases/tag/b7223). Extract and place the binary in your project directory for use with Vulkan.
164+
165+
#### 3. Start Llama Servers
166+
167+
Start the Qwen3-8B server (port 5778) as required. The Qwen2.5-VL server (port 5777) is optional if you have previously run it and already have the database.
168+
169+
**Qwen3-8B (port 5778):**
170+
171+
```bash
172+
ONEAPI_DEVICE_SELECTOR=level_zero:0 ./llama-b7223-bin-ubuntu-vulkan-x64/build/bin/llama-server -hf unsloth/Qwen3-8B-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5778 --jinja
173+
```
174+
175+
**Qwen2.5-VL-7B-Instruct (port 5776, optional):**
176+
177+
```bash
178+
ONEAPI_DEVICE_SELECTOR=level_zero:0 ./llama-b7223-bin-ubuntu-vulkan-x64/build/bin/llama-server -hf unsloth/Qwen2.5-VL-7B-Instruct-GGUF:Q4_K_M -ngl 99 --reasoning-budget 0 --host 0.0.0.0 --port 5776 --jinja
179+
```
180+
181+
#### 4. Run the Gradio Application
182+
183+
Once dependencies and the server are ready, run the script:
184+
185+
```bash
186+
python3 app.py
187+
```
188+
189+
Once started, open http://localhost:5999 in your browser.
190+
191+
## FAQ
192+
193+
### How do I change the video file, collection name, or system prompt?
194+
195+
The application comes with pre-configured scenarios (Traffic, Retail, Manufacturing), but you can customize them by modifying the `config.json` file. Each scenario contains the following configurable parameters:
196+
197+
#### Customizing Video Files and Settings
198+
199+
To change the video file, collection name, or system prompt for any scenario:
200+
201+
1. **Open the `config.json` file** in your project directory
202+
2. **Locate the scenario** you want to modify (e.g., "Traffic", "Retail", "Manufacturing")
203+
3. **Update the following fields** as needed:
204+
205+
```json
206+
{
207+
"YourScenario": {
208+
"video_path": "path/to/your/video.mp4",
209+
"video_label": "Your Custom Video Label",
210+
"collection_name": "your_collection_name",
211+
"header": "Your Custom Header Title",
212+
"description": "Your custom description for the scenario",
213+
"system_prompt": "Your custom system prompt that defines the AI assistant's behavior and analysis focus."
214+
}
215+
}
216+
```
217+
218+
#### Parameter Descriptions:
219+
- **`video_path`**: Path to your video file (relative to the project directory)
220+
- **`video_label`**: Display name for the video in the interface
221+
- **`collection_name`**: Name for the ChromaDB collection (used for storing embeddings)
222+
- **`header`**: Title displayed at the top of the web interface
223+
- **`description`**: Description text shown in the interface
224+
- **`system_prompt`**: Instructions that define how the AI assistant should analyze and respond to video content
225+
226+
#### Example: Adding a Custom Scenario
227+
228+
```json
229+
{
230+
"Security": {
231+
"video_path": "assets/security-footage.mp4",
232+
"video_label": "Security Monitoring",
233+
"collection_name": "security",
234+
"header": "Smart Security Intelligence: AI Video Summarization + Interactive Chat",
235+
"description": "Intelligent security monitoring system for detecting and analyzing suspicious activities.",
236+
"system_prompt": "You are a security monitoring assistant. Analyze the video for any suspicious activities, unauthorized access, or security incidents. Provide detailed descriptions of people, their actions, and any potential security concerns."
237+
}
238+
}
239+
```
240+
241+
4. **Save the file** and restart the application for changes to take effect
242+
5. **Place your video file** in the specified path (typically in the `assets/` folder)
243+
244+
### How do I change the AI models used in the application?
245+
246+
The application uses three different AI models for various tasks:
247+
248+
#### Current Models:
249+
- **Qwen3-8B** (Port 5778): Text generation and chatbot responses
250+
- **Qwen2.5-VL-7B-Instruct** (Port 5776): Vision-language model for video frame analysis
251+
- **BAAI/bge-small-en-v1.5**: Embedding model for vector storage (used in both `video_summarization.py` and `ai_chatbot.py`)
252+
- **BAAI/bge-reranker-base**: Reranker model for improving search results (used in `ai_chatbot.py`)
253+
254+
#### Changing the LLM or VLM:
255+
256+
1. **Modify the model in app.py**:
257+
- Open `app.py` and locate the `start_llamacpp_server()` function
258+
- Replace the model names in the `-hf` parameter:
259+
260+
```python
261+
# For text generation model (port 5778)
262+
"-hf", "your-organization/your-text-model-GGUF:quantization"
263+
264+
# For vision-language model (port 5776)
265+
"-hf", "your-organization/your-vision-model-GGUF:quantization"
266+
```
267+
268+
2. **Update API endpoints** (if using different ports):
269+
- In `ai_chatbot.py`, modify the `api_base` URL (line 19) for the text model
270+
- In `video_summarization.py`, modify the `base_url` (line 70) for the vision model
271+
272+
#### Changing the Embedding Model:
273+
274+
1. **Modify the embedding model in both files**:
275+
- In `video_summarization.py` (line 140): Replace the model name for video analysis embedding
276+
- In `ai_chatbot.py` (line 42): Replace the model name for chatbot query embedding
277+
278+
```python
279+
# In both files
280+
embed_model = HuggingFaceEmbedding(model_name="your-preferred-embedding-model")
281+
```
282+
283+
#### Changing the Reranker Model:
284+
285+
1. **Modify the reranker model in ai_chatbot.py**:
286+
- Open `ai_chatbot.py`
287+
- Locate line 56 and replace the reranker model:
288+
289+
```python
290+
rerank = SentenceTransformerRerank(top_n=1, model="your-preferred-reranker-model")
291+
```
292+
293+
#### Requirements for Model Changes:
294+
- **GGUF Format**: LLM models must be in GGUF format for llama.cpp compatibility
295+
- **Hugging Face**: Models should be available on Hugging Face Hub
296+
- **Quantization**: Choose appropriate quantization (e.g., Q4_K_M, Q5_K_M, Q8_0)
297+
- **Hardware**: Ensure your hardware can handle the model size and requirements
298+
299+
#### Example: Using Different Models
300+
301+
```python
302+
# In app.py - Replace with Llama 3.1 models
303+
"-hf", "unsloth/Llama-3.1-8B-Instruct-GGUF:Q4_K_M" # Text model
304+
"-hf", "openbmb/MiniCPM-V-4_5-gguf:Q8_0" # Vision model
305+
306+
# In both video_summarization.py and ai_chatbot.py - Use different embedding model
307+
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
308+
309+
# In ai_chatbot.py - Use different reranker model
310+
rerank = SentenceTransformerRerank(top_n=1, model="BAAI/bge-reranker-large")
311+
```
312+
313+
**Note**: After changing models, restart the application and allow time for the new models to download on first use.

0 commit comments

Comments
 (0)