Qwen3-VL-Outpost

Note

HF Demo: https://huggingface.co/spaces/prithivMLmods/Qwen3-VL-Outpost

Qwen3-VL-Outpost is a Gradio-based web application for vision-language tasks, leveraging multiple Qwen vision-language models to process images and videos. It provides an intuitive interface for users to input queries, upload media, and generate detailed responses using advanced models like Qwen3-VL and Qwen2.5-VL.

Screenshot 2025-11-02 at 01-45-32 Qwen3-VL-Outpost - a Hugging Face Space by prithivMLmods

Important

note: remove kernels and flash_attn3 implementation if you are using it on non-hopper architecture gpus.

Features

Image and Video Inference: Upload images or videos and input text queries to generate detailed responses.
Multiple Model Support: Choose from the following models:
- Qwen3-VL-4B-Instruct
- Qwen3-VL-8B-Instruct
- Qwen3-VL-4B-Thinking
- Qwen2.5-VL-3B-Instruct
- Qwen2.5-VL-7B-Instruct
Customizable Parameters: Adjust advanced settings such as max new tokens, temperature, top-p, top-k, and repetition penalty.
Real-time Streaming: View model outputs as they are generated.
Custom Theme: Uses a tailored SteelBlueTheme for an enhanced user interface.
Example Inputs: Predefined examples for quick testing of image and video inference.

Installation

Prerequisites

Python 3.10 or higher
Git
CUDA-compatible GPU (recommended for optimal performance)

Steps

1. Clone the Repository

git clone https://github.com/PRITHIVSAKTHIUR/Qwen3-VL-Outpost.git
cd Qwen3-VL-Outpost

2. Create a Virtual Environment (optional but recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

Install the required packages using:

pip install -r requirements.txt

requirements.txt includes:

git+https://github.com/huggingface/transformers.git@v4.57.6
git+https://github.com/huggingface/accelerate.git
git+https://github.com/huggingface/peft.git
transformers-stream-generator
huggingface_hub
qwen-vl-utils
sentencepiece
opencv-python
torch==2.8.0
torchvision
matplotlib
pdf2image
requests
pymupdf
kernels
hf_xet
spaces
pillow
gradio # - gradio@6.3.0
fpdf
timm
av

4. Run the Application

Start the Gradio interface with:

python app.py

This will launch the web interface, accessible via your browser. The application supports queuing with a maximum size of 50.

Usage

Select a Model: Choose one of the available Qwen models from the radio buttons.
Upload Media: Use the image or video upload section to provide input media.
Enter Query: Input your text query in the provided textbox.
Adjust Settings: Optionally tweak advanced parameters like max new tokens or temperature in the accordion.
Submit: Click the Submit button to generate a response.
- Outputs are displayed in real-time in the Raw Output Stream and as formatted Markdown.

Example Queries

Image Inference

“Explain the content in detail.” (with an uploaded image)
“Jsonify Data.” (for images with tabular data)

Video Inference

“Explain the ad in detail.” (with an uploaded video)
“Identify the main actions in the video.”

Project Structure

Qwen3-VL-Outpost/
│
├── app.py              # Main application script containing the Gradio interface and model logic
├── images/             # Directory for example image files
├── videos/             # Directory for example video files
├── requirements.txt    # List of dependencies required for the project
└── README.md           # Project documentation

Notes

The application uses PyTorch with GPU acceleration (torch.cuda) if available; otherwise, it falls back to CPU.
Video processing downsamples videos to a maximum of 10 frames to optimize memory usage.
Ensure sufficient disk space and memory when loading large models such as Qwen3-VL-8B-Instruct.
The application is designed to run in a browser via Gradio's web interface.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch:
```
git checkout -b feature-branch
```
Make your changes and commit:
```
git commit -m "Add new feature"
```
Push to the branch:
```
git push origin feature-branch
```
Open a pull request.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-VL-Outpost

Features

Installation

Prerequisites

Steps

1. Clone the Repository

2. Create a Virtual Environment (optional but recommended)

3. Install Dependencies

4. Run the Application

Usage

Example Queries

Image Inference

Video Inference

Project Structure

Notes

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Qwen3-VL-Outpost

Features

Installation

Prerequisites

Steps

1. Clone the Repository

2. Create a Virtual Environment (optional but recommended)

3. Install Dependencies

4. Run the Application

Usage

Example Queries

Image Inference

Video Inference

Project Structure

Notes

Contributing

License