Skip to content

Conversation

@gsaluja9
Copy link
Collaborator

@gsaluja9 gsaluja9 commented Sep 15, 2025

Adds auto generation of image captions using BLIP.
https://huggingface.co/docs/transformers/main/en/model_doc/blip#transformers.BlipForConditionalGeneration

TODO:

Add tests : Adding a validation at build time with a basic script.

  • Add docs

@gsaluja9 gsaluja9 requested review from bovlb and drewaogle September 15, 2025 22:42
@gsaluja9 gsaluja9 marked this pull request as ready for review September 17, 2025 13:52
@gsaluja9 gsaluja9 requested a review from luisremis September 17, 2025 18:22

COPY requirements.txt /
RUN pip install -U pip
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's cleaner to keep requirements in requirements files. See how this is done in https://github.com/aperture-data/workflows/blob/main/base/docker/scripts/embeddings/requirements_cpu.txt

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! This is actually a TODO I care about. Actually, I was wondering on how to do this conditionally. Like if I want to specify CPU, cuda or metal at the top level and have images that optimizes for the architecture I am developing on. It'll be a major win. I still havent zeroed in, but as you can see this is WIP. Do you have any ideas for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look around and I don't see a good solution that just does the right thing on any architecture. Forcing CPU versions comes closest. To do better, I think we would have to build and deploy multiple versions of the docker image with the device as a build argument.

query = [{
"FindImage": {
"constraints": {
self.done_property: ["==", None]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be slightly more robust to say ["!=", True].

Comment on lines +13 to +14
processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any documentation that says these are thread-safe.

"UpdateImage": {
"ref": i + 1,
"properties": {
self.done_property: captions[i]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. So not really a done property then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea. should have changed after copying images from embedding extarction. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the properties name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we distinguish between "not yet tried to generate a caption" and "I tried but couldn't do it"?

RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir -r /requirements.txt

COPY app/weights.py /app/weights.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is intended for cache warmup. It might be better to make that explicit in the name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. will cahnge.

@@ -0,0 +1,11 @@
import platform
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these files really go at the top level?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants