Skip to content

Commit fa9d88a

Browse files
author
Rossdan Craig rossdan@lastmileai.dev
committed
[HF][5/n] Image2Text: Allow base64 inputs for images
Before we didn't allow base64, only URI (either local or http or https). This is good becuase our text2Image model parser outputs into a base64 format, so this will allow us to chain model prompts! ## Test Plan Rebase and test on 0d7ae2b. Follow the README from AIConfig Editor https://github.com/lastmile-ai/aiconfig/tree/main/python/src/aiconfig/editor#dev, then run these command ```bash aiconfig_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/huggingface.aiconfig.json parsers_path=/Users/rossdancraig/Projects/aiconfig/cookbooks/Gradio/hf_model_parsers.py alias aiconfig="python3 -m 'aiconfig.scripts.aiconfig_cli'" aiconfig edit --aiconfig-path=$aiconfig_path --server-port=8080 --server-mode=debug_servers --parsers-module-path=$parsers_path ``` Then in AIConfig Editor run the prompt (streaming not supported so just took screenshots) These are the images I tested (with bear being in base64 format) ![fox_in_forest](https://github.com/lastmile-ai/aiconfig/assets/151060367/ca7d1723-9e12-4cc8-9d8d-41fa9f466919) ![bear-eating-honey](https://github.com/lastmile-ai/aiconfig/assets/151060367/a947d89e-c02a-4c64-8183-ff1c85802859) <img width="1281" alt="Screenshot 2024-01-10 at 04 57 44" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/ea60cbc5-e6ab-4bf2-82e7-17f3182fdc5c">
1 parent 19d7844 commit fa9d88a

File tree

1 file changed

+18
-7
lines changed
  • extensions/HuggingFace/python/src/aiconfig_extension_hugging_face/local_inference

1 file changed

+18
-7
lines changed

extensions/HuggingFace/python/src/aiconfig_extension_hugging_face/local_inference/image_2_text.py

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
1+
import base64
12
import json
2-
from typing import Any, Dict, Optional, List, TYPE_CHECKING
3+
from io import BytesIO
4+
from PIL import Image
5+
from typing import Any, Dict, Optional, List, TYPE_CHECKING, Union
36
from transformers import (
47
Pipeline,
58
pipeline,
@@ -107,7 +110,7 @@ async def deserialize(
107110
completion_params = refine_completion_params(model_settings)
108111

109112
#Add image inputs
110-
inputs = validate_and_retrieve_image_from_attachments(prompt)
113+
inputs = validate_and_retrieve_images_from_attachments(prompt)
111114
completion_params["inputs"] = inputs
112115

113116
await aiconfig.callback_manager.run_callbacks(CallbackEvent("on_deserialize_complete", __name__, {"output": completion_params}))
@@ -218,7 +221,7 @@ def validate_attachment_type_is_image(attachment: Attachment):
218221
raise ValueError(f"Invalid attachment mimetype {attachment.mime_type}. Expected image mimetype.")
219222

220223

221-
def validate_and_retrieve_image_from_attachments(prompt: Prompt) -> list[str]:
224+
def validate_and_retrieve_images_from_attachments(prompt: Prompt) -> list[Union[str, Image]]:
222225
"""
223226
Retrieves the image uri's from each attachment in the prompt input.
224227
@@ -232,15 +235,23 @@ def validate_and_retrieve_image_from_attachments(prompt: Prompt) -> list[str]:
232235
if not hasattr(prompt.input, "attachments") or len(prompt.input.attachments) == 0:
233236
raise ValueError(f"No attachments found in input for prompt {prompt.name}. Please add an image attachment to the prompt input.")
234237

235-
image_uris: list[str] = []
238+
images: list[Union[str, Image]] = []
236239

237240
for i, attachment in enumerate(prompt.input.attachments):
238241
validate_attachment_type_is_image(attachment)
239242

240-
if not isinstance(attachment.data, str):
243+
input_data = attachment.data
244+
if not isinstance(input_data, str):
241245
# See todo above, but for now only support uri's
242246
raise ValueError(f"Attachment #{i} data is not a uri. Please specify a uri for the image attachment in prompt {prompt.name}.")
243247

244-
image_uris.append(attachment.data)
248+
# Really basic heurestic to check if the data is a base64 encoded str
249+
# vs. uri. This will be fixed once we have standardized inputs
250+
# See https://github.com/lastmile-ai/aiconfig/issues/829
251+
if len(input_data) > 10000:
252+
pil_image : Image = Image.open(BytesIO(base64.b64decode(input_data)))
253+
images.append(pil_image)
254+
else:
255+
images.append(input_data)
245256

246-
return image_uris
257+
return images

0 commit comments

Comments
 (0)