-
Notifications
You must be signed in to change notification settings - Fork 543
Add inference snippets for image-text-to-text #927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ], | ||
| max_tokens: 500, | ||
| })) { | ||
| process.stdout.write(chunk.choices[0]?.delta?.content || ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| process.stdout.write(chunk.choices[0]?.delta?.content || ""); | |
| process.stdout.write(chunk.choices[0]?.delta?.content); |
| "${model.id}", | ||
| token="${accessToken || "{API_TOKEN}"}", | ||
| ) | ||
| image_url = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe good to show an example with pillow.Image.open and use that image's base64 str representation so that users can get an example where they can load local images
from PIL import Image
import requests
from io import BytesIO
import base64
image_url = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
# Convert the image to a byte array in PNG format
buffered = BytesIO()
image.save(buffered, format="PNG")
# Encode this byte array to base64
img_base64 = base64.b64encode(buffered.getvalue())
# Print the base64 string
print(img_base64.decode())maybe the snippet would become too long. I will let you decide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm i'd say this would complicate a bit too much (no strong opinion though)
Note that this is for remote inference not local usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is for remote inference not local usage
yes, I meant more like: remote inference using local img file (otherwise, to use the snippet, user needs to upload their image and get its url)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know if it's possible to have several snippets by returning a list, same as for code snippets ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know if it's possible to have several snippets by returning a list, same as for code snippets ?
for inference snippet, not possible right now. So suggest that:
- we merge this PR as it is with only image url example
- mayeb unify on moon-side to have inference snippet to be able to have a list like code snippets. If so, we can re-iterate and add an example with a local image
|
@coyotte508 @mishig25 thanks for the feedback. I addressed above the comment to use |
| export const snippetConversationalWithImage = (model: ModelDataMinimal, accessToken: string): string => | ||
| `from huggingface_hub import InferenceClient | ||
| client = InferenceClient( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, so in playground, we use snippet like this to match it as much as possible to OAI format/spec:
from huggingface_hub import InferenceClient
client = InferenceClient(api_key="YOUR_HF_TOKEN")
messages = [
{ "role": "user", "content": "Tell me a story" }
]
output = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.3",
messages=messages,
stream=True,
temperature=0.5,
max_tokens=1024,
top_p=0.7
)the specific changes are:
- use
api_keyrather thantoken - declare
modelinsidecompletions.createrather thanInferenceClient(
I will let you decide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment maybe applies to text conv snippet as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we decide to change it, lets handle in subseq PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use the same convention then. I've addressed it in e4c6cba for both the text-generation and text-image-to-text snippets.
…gingface/huggingface.js into code-snippets-for-image-text-to-text
Well well well, looks like it yes. Addressed in 0f8452c |
mishig25
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! again
|
Thanks! Sorry about the back and forth 😬 |
|
trigerred https://github.com/huggingface/huggingface.js/actions/runs/11107930206 so that we can get it in moon |
Follow up to #927 python equavalent of https://github.com/huggingface/huggingface.js/blob/1bb5b31131c6990547087d91aebda2361e91dfad/packages/tasks/src/snippets/js.ts#L188 Because of this missing line, user does not see `python` amongst the options in the[ inference snippet](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct?inference_api=true) <img width="995" alt="image" src="https://github.com/user-attachments/assets/2237ca48-0b90-4c68-8beb-60ecdbbb0b86">

This PR adds inference snippets for
image-text-to-textmodels, say meta-llama/Llama-3.2-11B-Vision-Instruct for example 😄I've tested all three examples locally and they work as expected :)