Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Visual Question Answering Examples

Single-HPU inference

The run_pipeline.py script showcases how to use the Transformers pipeline API to run visual question answering task on HPUs.

PT_HPU_LAZY_MODE=1 python3 run_pipeline.py \
    --model_name_or_path Salesforce/blip-vqa-capfilt-large \
    --image_path "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" \
    --question "how many dogs are in the picture?" \
    --use_hpu_graphs \
    --bf16

OpenCLIP inference

The run_openclip_vqa.py can be used to run zero shot image classification with OpenCLIP Huggingface Models. The requirements for run_openclip_vqa.py can be installed with openclip_requirements.txt as follows:

pip install -r openclip_requirements.txt

By default, the script runs the sample outlined in BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 notebook. One can also can also run other OpenCLIP models by specifying model, classifier labels and image URL(s) like so:

PT_HPU_LAZY_MODE=1 python run_openclip_vqa.py \
    --model_name_or_path laion/CLIP-ViT-g-14-laion2B-s12B-b42K \
    --labels "a dog" "a cat" \
    --image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
    --use_hpu_graphs \
    --bf16