Doubutsu Image Describer for ComfyUI

⚠️ DEPRECATED - NO LONGER MAINTAINED

This project is no longer actively maintained. Many better vision tools are now available and supported in ComfyUI. Please consider using alternative vision models and nodes that are actively maintained.

This custom node for ComfyUI allows you to use the Doubutsu small VLM model to describe images. Credit and further information on Doubutsu: https://huggingface.co/qresearch/doubutsu-2b-pt-756

Installation

Clone this repository into your ComfyUI's custom_nodes directory: git clone https://github.com/EnragedAntelope/comfyui-doubutsu-describer.git
Install the required dependencies: pip install -r requirements.txt
Download the model files:

Create a models directory in the root of this repository (ComfyUI\custom_nodes\ComfyUI-Doubutsu-Describer).
Download the model files for "qresearch/doubutsu-2b-pt-756" from Hugging Face and place them in models/qresearch/doubutsu-2b-pt-756/.
Download the adapter files for "qresearch/doubutsu-2b-lora-756-docci" and place them in models/qresearch/doubutsu-2b-lora-756-docci/.

You can download these files manually from the Hugging Face website or use the Hugging Face CLI:

Open a command prompt, navigate to your ComfyUI\custom_nodes\ComfyUI-Doubutsu-Describer directory, then execute:

huggingface-cli download qresearch/doubutsu-2b-pt-756 --local-dir models/qresearch/doubutsu-2b-pt-756

huggingface-cli download qresearch/doubutsu-2b-lora-756-docci --local-dir models/qresearch/doubutsu-2b-lora-756-docci

Restart ComfyUI

Usage

After installation, you'll find a new node called "Doubutsu Image Describer" in the "image/text" category. Connect an image to its input, and it will generate a description based on the provided question.

Parameters

image: The input image to describe
question: The question to ask about the image (default: "Describe the image")
max_new_tokens: Maximum number of tokens to generate (default: 128)
temperature: Controls randomness in generation (default: 0.1)
precision: Choose between float16 or bfloat16 for inference. If your GPU supports it, bfloat16 should be quicker.

License

[Apache 2.0]

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
nodes		nodes
Doubotsu VLM Image Describer.json		Doubotsu VLM Image Describer.json
Doubotsu VLM Image Describer.png		Doubotsu VLM Image Describer.png
__init__.py		__init__.py
pyproject.toml		pyproject.toml
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doubutsu Image Describer for ComfyUI

Installation

Usage

Parameters

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

EnragedAntelope/ComfyUI-Doubutsu-Describer

Folders and files

Latest commit

History

Repository files navigation

Doubutsu Image Describer for ComfyUI

Installation

Usage

Parameters

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages