Multiturn mm single image #1270

vmpuri · 2024-10-04T20:58:50Z

Multi-turn conversations were not working within the browser for LLaMA 3.2 Vision. This fixes this for the case of:

Text only conversation
Conversation with one image prompt.

Tests

Model	Browser Test Text Only	Browser Test with Image	Generate CLI test
LLaMA 3.1	✅	-	✅
LLaMA 3.2 11B Vision	✅	✅	✅

CLI Tests

python3 torchchat.py generate llama3.2-11B --prompt "What's this?" --image-prompt assets/dog.jpg

/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/data/metrics/__init__.py:19: UserWarning: A NumPy version >=1.23.5 and <2.3.0 is required for this version of SciPy (detected version 1.23.2)
  from scipy.stats import pearsonr, spearmanr
2024-10-04:18:39:09,981 INFO     [sdpa_with_kv_cache.py:28] Loading custom ops library: /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/executorch/extension/llm/custom_ops/libcustom_ops_aot_lib.dylib
Using device=mps 
Loading model...
Time to load model: 38.54 seconds
-----------------------------------------------------------
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/torch/nn/functional.py:5096: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:465.)
  return torch._C._nn.pad(input, pad, mode, value)
What's this?The image depicts a dog riding a skateboard on a road, showcasing its unique and playful appearance.

* A dog:
        + Sitting on a skateboard
        + Wearing sunglasses
        + Has a blue collar around its neck
        + Ears perked up
        + Tongue out
* A skateboard:
        + Red in color
        + Has yellow wheels
        + Being ridden by the dog
* Sunglasses:
        + Pink in color
        + Worn by the dog

The image presents a lighthearted and humorous scene, with the dog's sunglasses and skateboard adding to its playful and carefree demeanor.2024-10-04:18:40:47,012 INFO     [generate.py:1138] 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
Generated 131 tokens                 
Time for inference 1: 56.0592 sec total                 
Time to first token: 14.2072 sec with parallel prefill.                

      Total throughput: 2.3547 tokens/sec, 0.4247 s/token                 
First token throughput: 0.0704 tokens/sec, 14.2072 s/token                 
 Next token throughput: 3.1301 tokens/sec, 0.3195 s/token                     
2024-10-04:18:40:47,012 INFO     [generate.py:1149] 
Bandwidth achieved: 50.14 GB/s
2024-10-04:18:40:47,012 INFO     [generate.py:1153] *** This first iteration will include cold start effects for dynamic import, hardware caches. ***

========================================


      Average tokens/sec (total): 2.35                 
Average tokens/sec (first token): 0.07                 
Average tokens/sec (next tokens): 3.13

(.venv) puri@puri-mac torchchat % python3 torchchat.py generate --checkpoint-path /Users/puri/.torchchat/model-cache/stories15M/stories15M.pt --pte-path stories15M.pte
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/data/metrics/__init__.py:19: UserWarning: A NumPy version >=1.23.5 and <2.3.0 is required for this version of SciPy (detected version 1.23.2)
  from scipy.stats import pearsonr, spearmanr
^E2024-10-04:18:41:19,239 INFO     [sdpa_with_kv_cache.py:28] Loading custom ops library: /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/executorch/extension/llm/custom_ops/libcustom_ops_aot_lib.dylib
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Using device=mps 
Loading model...
Cannot load specified PTE to mps. Attempting to load model to CPU instead
Time to load model: 0.05 seconds
[program.cpp:134] InternalConsistency verification requested but not available
-----------------------------------------------------------
Hello, my name is9 in the garden. The garden is very beautiful and all the flowers she blooms were so happy. One day, an old flower started to bloom. It was so beautiful that it fluttered around and around. Then, it was a small flower that bloomed in many different colors.
A few kids came to the garden and saw the beautiful flower. They thought it was so special, so they wanted to see it bloom. But the old flower was too high up on a tall tree, so they couldn't reach2024-10-04:18:41:19,824 INFO     [generate.py:1138] 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
Generated 109 tokens                 
Time for inference 1: 0.2679 sec total                 
Time to first token: 0.0256 sec with sequential prefill.                

      Total throughput: 410.6411 tokens/sec, 0.0024 s/token                 
First token throughput: 39.0747 tokens/sec, 0.0256 s/token                 
 Next token throughput: 449.8894 tokens/sec, 0.0022 s/token                     
2024-10-04:18:41:19,824 INFO     [generate.py:1149] 
Bandwidth achieved: 0.00 GB/s
2024-10-04:18:41:19,824 INFO     [generate.py:1153] *** This first iteration will include cold start effects for dynamic import, hardware caches. ***

========================================


      Average tokens/sec (total): 410.64                 
Average tokens/sec (first token): 39.07                 
Average tokens/sec (next tokens): 449.89

(.venv) puri@puri-mac torchchat % python3 torchchat.py generate llama3 --prompt "Give me a table of 4 cold war era jets and their max speeds in km/h."                 
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/data/metrics/__init__.py:19: UserWarning: A NumPy version >=1.23.5 and <2.3.0 is required for this version of SciPy (detected version 1.23.2)
  from scipy.stats import pearsonr, spearmanr
2024-10-04:18:41:28,664 INFO     [sdpa_with_kv_cache.py:28] Loading custom ops library: /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/executorch/extension/llm/custom_ops/libcustom_ops_aot_lib.dylib
Using device=mps 
Loading model...
Time to load model: 27.24 seconds
-----------------------------------------------------------
Give me a table of 4 cold war era jets and their max speeds in km/h.Here is a table of 4 Cold War era jets and their maximum speeds in km/h:

| Aircraft | Country | Maximum Speed (km/h) |
| --- | --- | --- |
| Mikoyan-Gurevich MiG-15 | Soviet Union | 1,050 |
| North American F-86 Sabre | United States | 1,075 |
| Supermarine Swift F.7 | United Kingdom | 1,110 |
| Lockheed F-104 Starfighter | United States | 2,184 |

Note: The maximum speeds listed are approximate and may vary depending on the specific variant and configuration of each aircraft.2024-10-04:18:42:26,483 INFO     [generate.py:1138] 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
Generated 130 tokens                 
Time for inference 1: 30.2699 sec total                 
Time to first token: 1.5681 sec with parallel prefill.                

      Total throughput: 4.3277 tokens/sec, 0.2311 s/token                 
First token throughput: 0.6377 tokens/sec, 1.5681 s/token                 
 Next token throughput: 4.5293 tokens/sec, 0.2208 s/token                     
2024-10-04:18:42:26,483 INFO     [generate.py:1149] 
Bandwidth achieved: 69.51 GB/s
2024-10-04:18:42:26,483 INFO     [generate.py:1153] *** This first iteration will include cold start effects for dynamic import, hardware caches. ***

========================================


      Average tokens/sec (total): 4.33                 
Average tokens/sec (first token): 0.64                 
Average tokens/sec (next tokens): 4.53

Tested with server + browser:

Server Terminal	Browser Terminal
`python3 torchchat.py server llama3.2-11b`	`streamlit run torchchat/usages/browser.py`

Single Image

Text Only

…quence length so their shapes are both set to the max_seq_len

pytorch-bot · 2024-10-04T20:58:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1270

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d494aa0 with merge base d8c0aaf ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jack-Khuu · 2024-10-04T21:31:06Z

torchchat/generate.py

        if len(prompt.shape) > 1:
            prompt = prompt.squeeze(0)
-        T = prompt.size(0)
-        max_new_tokens = min(max_new_tokens, max_seq_length - start_pos - T)


Is this line necessary to remove?

This seems like it would be problem with long prompt_lengths

Jack-Khuu · 2024-10-04T21:39:46Z

torchchat/generate.py

-        ]
+        image_found = False
+        messages = []
+        for message in prompt:


This would be torchchat.py generate Llama3.2-11B right?

Since it sends prompt: str and uses the image_prompt field

You might need to "create" a container prompt with those 2

Or chat calls a curried version of this function that creates the format before calling this function

Jack-Khuu · 2024-10-04T21:43:54Z

torchchat/usages/openai_api.py

        if not isinstance(self.model, FlamingoModel):
            prompt = [
                {"role": message["role"], "content": message["content"]}
-                for message in completion_request.messages
+                for message in messages
            ]
            return self._gen_model_input(
                prompt=prompt, max_new_tokens=completion_request.max_tokens
            )

        # Llama 3.2 11B
-        prompt = None
-        images = None
-
-        for message in messages:
-            torchtune_contents = []
-            if isinstance(message["content"], list):
-                for content_dict in message["content"]:
-                    if content_dict["type"] == "text":
-                        assert (
-                            prompt is None
-                        ), "At most one text prompt is supported for each request"
-                        prompt = content_dict["text"]
-                    elif content_dict["type"] == "image_url":
-                        assert (
-                            images is None
-                        ), "At most one image is supported at the moment"
-
-                        base64_decoded = base64.b64decode(
-                            content_dict["image_url"].split(";base64,")[1]
-                        )
-                        images = [Image.open(BytesIO(base64_decoded))]
-
-        assert prompt is not None, "Text prompt must be specified in the request"
-
-        return self._gen_model_input(prompt, images, completion_request.max_tokens)
+
+        prompt = [
+            {"role": message["role"], "content": message["content"]}
+            for message in messages
+        ]
+
+        return self._gen_model_input(
+            prompt=prompt, max_new_tokens=completion_request.max_tokens
+        )


Unnecessary if check now

Jack-Khuu and others added 5 commits October 2, 2024 09:35

initial test

53f9d34

Pad casual mask with zeroes and set decoder max_seq_len to the max se…

25beb26

…quence length so their shapes are both set to the max_seq_len

Merge branch 'main' into multiturn-mm-single-image

61d1e0e

Merge branch 'main' into multiturn-mm-single-image

91a68ab

Fix control bug for image inputs

ad84f51

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 4, 2024

vmpuri requested review from Gasoonjia, Jack-Khuu and byjlw October 4, 2024 21:01

vmpuri marked this pull request as ready for review October 4, 2024 21:23

Jack-Khuu suggested changes Oct 4, 2024

View reviewed changes

Clear image input after submitting a chat

26a99fc

vmpuri force-pushed the multiturn-mm-single-image branch from f5b512e to be0632b Compare October 5, 2024 00:07

vmpuri added 2 commits October 4, 2024 18:10

Include empty assistant message for chat

1abd632

Pipe image input from CLI

d494aa0

vmpuri force-pushed the multiturn-mm-single-image branch from be0632b to d494aa0 Compare October 5, 2024 01:43

Jack-Khuu approved these changes Oct 5, 2024

View reviewed changes

vmpuri merged commit d0993b3 into main Oct 5, 2024
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiturn mm single image #1270

Multiturn mm single image #1270

Uh oh!

vmpuri commented Oct 4, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 4, 2024 •

edited

Loading

Uh oh!

Jack-Khuu Oct 4, 2024

Uh oh!

Jack-Khuu Oct 4, 2024

Uh oh!

Jack-Khuu Oct 4, 2024 •

edited

Loading

Uh oh!

Jack-Khuu Oct 4, 2024

Uh oh!

Jack-Khuu Oct 4, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Multiturn mm single image #1270

Multiturn mm single image #1270

Uh oh!

Conversation

vmpuri commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

CLI Tests

Uh oh!

pytorch-bot bot commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1270

✅ No Failures

Uh oh!

Jack-Khuu Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Oct 4, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vmpuri commented Oct 4, 2024 •

edited

Loading

pytorch-bot bot commented Oct 4, 2024 •

edited

Loading

Jack-Khuu Oct 4, 2024 •

edited

Loading