The run_pipeline.py script showcases how to use the Transformers pipeline API to run visual question answering task on HPUs.
PT_HPU_LAZY_MODE=1 python3 run_pipeline.py \
--model_name_or_path Salesforce/blip-vqa-capfilt-large \
--image_path "https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg" \
--question "how many dogs are in the picture?" \
--use_hpu_graphs \
--bf16The run_openclip_vqa.py can be used to run zero shot image classification with OpenCLIP Huggingface Models.
The requirements for run_openclip_vqa.py can be installed with openclip_requirements.txt as follows:
pip install -r openclip_requirements.txtBy default, the script runs the sample outlined in BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 notebook. One can also can also run other OpenCLIP models by specifying model, classifier labels and image URL(s) like so:
PT_HPU_LAZY_MODE=1 python run_openclip_vqa.py \
--model_name_or_path laion/CLIP-ViT-g-14-laion2B-s12B-b42K \
--labels "a dog" "a cat" \
--image_path "http://images.cocodataset.org/val2017/000000039769.jpg" \
--use_hpu_graphs \
--bf16