-
Notifications
You must be signed in to change notification settings - Fork 248
Multiturn mm single image #1270
Conversation
…quence length so their shapes are both set to the max_seq_len
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1270
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d494aa0 with merge base d8c0aaf ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| if len(prompt.shape) > 1: | ||
| prompt = prompt.squeeze(0) | ||
| T = prompt.size(0) | ||
| max_new_tokens = min(max_new_tokens, max_seq_length - start_pos - T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this line necessary to remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would be problem with long prompt_lengths
| ] | ||
| image_found = False | ||
| messages = [] | ||
| for message in prompt: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be torchchat.py generate Llama3.2-11B right?
Since it sends prompt: str and uses the image_prompt field
You might need to "create" a container prompt with those 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or chat calls a curried version of this function that creates the format before calling this function
| if not isinstance(self.model, FlamingoModel): | ||
| prompt = [ | ||
| {"role": message["role"], "content": message["content"]} | ||
| for message in completion_request.messages | ||
| for message in messages | ||
| ] | ||
| return self._gen_model_input( | ||
| prompt=prompt, max_new_tokens=completion_request.max_tokens | ||
| ) | ||
|
|
||
| # Llama 3.2 11B | ||
| prompt = None | ||
| images = None | ||
|
|
||
| for message in messages: | ||
| torchtune_contents = [] | ||
| if isinstance(message["content"], list): | ||
| for content_dict in message["content"]: | ||
| if content_dict["type"] == "text": | ||
| assert ( | ||
| prompt is None | ||
| ), "At most one text prompt is supported for each request" | ||
| prompt = content_dict["text"] | ||
| elif content_dict["type"] == "image_url": | ||
| assert ( | ||
| images is None | ||
| ), "At most one image is supported at the moment" | ||
|
|
||
| base64_decoded = base64.b64decode( | ||
| content_dict["image_url"].split(";base64,")[1] | ||
| ) | ||
| images = [Image.open(BytesIO(base64_decoded))] | ||
|
|
||
| assert prompt is not None, "Text prompt must be specified in the request" | ||
|
|
||
| return self._gen_model_input(prompt, images, completion_request.max_tokens) | ||
|
|
||
| prompt = [ | ||
| {"role": message["role"], "content": message["content"]} | ||
| for message in messages | ||
| ] | ||
|
|
||
| return self._gen_model_input( | ||
| prompt=prompt, max_new_tokens=completion_request.max_tokens | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary if check now
f5b512e to
be0632b
Compare
be0632b to
d494aa0
Compare
Multi-turn conversations were not working within the browser for LLaMA 3.2 Vision. This fixes this for the case of:
Tests
CLI Tests
Tested with server + browser:
python3 torchchat.py server llama3.2-11bstreamlit run torchchat/usages/browser.pySingle Image

Text Only