⚠️ DEPRECATED - NO LONGER MAINTAINEDThis project is no longer actively maintained. Many better vision tools are now available and supported in ComfyUI. Please consider using alternative vision models and nodes that are actively maintained.
This custom node for ComfyUI allows you to use the Doubutsu small VLM model to describe images. Credit and further information on Doubutsu: https://huggingface.co/qresearch/doubutsu-2b-pt-756
- Clone this repository into your ComfyUI's
custom_nodesdirectory: git clone https://github.com/EnragedAntelope/comfyui-doubutsu-describer.git - Install the required dependencies: pip install -r requirements.txt
- Download the model files:
- Create a
modelsdirectory in the root of this repository (ComfyUI\custom_nodes\ComfyUI-Doubutsu-Describer). - Download the model files for "qresearch/doubutsu-2b-pt-756" from Hugging Face and place them in
models/qresearch/doubutsu-2b-pt-756/. - Download the adapter files for "qresearch/doubutsu-2b-lora-756-docci" and place them in
models/qresearch/doubutsu-2b-lora-756-docci/.
You can download these files manually from the Hugging Face website or use the Hugging Face CLI:
Open a command prompt, navigate to your ComfyUI\custom_nodes\ComfyUI-Doubutsu-Describer directory, then execute:
huggingface-cli download qresearch/doubutsu-2b-pt-756 --local-dir models/qresearch/doubutsu-2b-pt-756
huggingface-cli download qresearch/doubutsu-2b-lora-756-docci --local-dir models/qresearch/doubutsu-2b-lora-756-docci
- Restart ComfyUI
After installation, you'll find a new node called "Doubutsu Image Describer" in the "image/text" category. Connect an image to its input, and it will generate a description based on the provided question.
image: The input image to describequestion: The question to ask about the image (default: "Describe the image")max_new_tokens: Maximum number of tokens to generate (default: 128)temperature: Controls randomness in generation (default: 0.1)precision: Choose between float16 or bfloat16 for inference. If your GPU supports it, bfloat16 should be quicker.
[Apache 2.0]
