image-to-dense-caption

Generate comprehensive, dense portrait descriptions using a vision‑language model.

This repository provides a command-line tool (infer.py) that:

Loads the Qwen2.5-VL-7B-Instruct-abliterated model
Processes images (supported formats: .png, .jpg, .jpeg, .webp, .bmp, .gif)
Produces rich descriptive paragraphs covering emotional expression, posture, clothing or nudity, body type, hair, and environmental context
Outputs one .txt file per image

🚀 Quick Start

1. Clone the repository

git clone https://github.com/anto18671/image-to-dense-caption.git
cd image-to-dense-caption

2. Set up your Python environment

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Download the model

git lfs install
git clone https://huggingface.co/huihui-ai/Qwen2.5-VL-7B-Instruct-abliterated

Ensure the folder Qwen2.5-VL-7B-Instruct-abliterated sits in the same directory as infer.py.

🔧 Usage

Place your images in a subfolder (default: images/)
Run the script:

python infer.py

This will:

Scan the folder for valid image files
Generate a .txt with dense descriptions for each image

📁 Output Example

images/photo1.jpg      → images/photo1.txt
images/portrait.webp   → images/portrait.txt

Each .txt includes a paragraph describing emotional expression, posture, clothing/nudity status, body type, hair, and environment.

💡 Tips & Configurations

GPU: Preferably use a GPU with ≥ 16 GB VRAM.
Memory options:
- Use 8‑bit quantization (via bitsandbytes) for lower VRAM.
- Switch to torch_dtype=torch.float16 if supported by your setup.
Custom folder: Change the image_folder path in infer.py if needed.

✅ Troubleshooting

OSError / Model not found: Confirm the model folder is correctly named and in place.
CUDA out-of-memory:
- Reduce VRAM usage by quantizing the model.
- Run on CPU by removing .to("cuda")—will be slower.
Non‑image files: Unsupported extensions are automatically skipped.

📄 License

MIT License — this script

Model usage under Hugging Face terms (see huihui-ai/Qwen2.5-VL-7B-Instruct-abliterated for details)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
infer.py		infer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

image-to-dense-caption

🚀 Quick Start

1. Clone the repository

2. Set up your Python environment

3. Install dependencies

4. Download the model

🔧 Usage

📁 Output Example

💡 Tips & Configurations

✅ Troubleshooting

📄 License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

image-to-dense-caption

🚀 Quick Start

1. Clone the repository

2. Set up your Python environment

3. Install dependencies

4. Download the model

🔧 Usage

📁 Output Example

💡 Tips & Configurations

✅ Troubleshooting

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages