[Distributed] Enable KV cache #1154

kwen2501 · 2024-09-17T00:15:29Z

Previously KV cache was turned off in distributed cases, enabling it now.

The cache size is related to TP degree -- because the heads are divided across ranks.

For this information to be available, we lowered KV cache instantiation from root level to attention layer (which actually owns the cache and knows how itself is being TP'lized).

pytorch-bot · 2024-09-17T00:15:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1154

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 1e78f59 with merge base 16b3d64 ():

NEW FAILURES - The following jobs have failed:

pull / test-mps / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / test-mps-dtype / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jack-Khuu · 2024-09-17T00:49:24Z

torchchat/model.py

        self.max_batch_size = max_batch_size
        for b in self.layers.values():
-            b.attention.kv_cache = KVCache(
-                max_batch_size, max_seq_length, self.config.n_local_heads, head_dim


This is fine, just calling it out as a note to self

Local head_dim is defined as self.config.dim // self.config.n_heads.

In Attention, head_dim is pulled from config.head_dim. Recall that TransformerArgs (config) is also initialized as self.config.dim // self.config.n_heads

Jack-Khuu · 2024-09-17T00:50:10Z

torchchat/model.py

        for b in self.layers.values():
-            b.attention.kv_cache = KVCache(
-                max_batch_size, max_seq_length, self.config.n_local_heads, head_dim
+            # Lower the setup_cache call to the attention module because tensor


head_dim as defined on 440 is no longer used, let's remove it

Good point, thanks!

lessw2020

Thanks for enabling, looks good!

kwen2501 requested review from Jack-Khuu and lessw2020 September 17, 2024 00:15

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 17, 2024

Jack-Khuu approved these changes Sep 17, 2024

View reviewed changes

[Distributed] Enable KV cache

07c21c1

kwen2501 force-pushed the kv_cache branch from 85a648f to 07c21c1 Compare September 17, 2024 05:00

Merge branch 'main' into kv_cache

1e78f59

lessw2020 approved these changes Sep 17, 2024

View reviewed changes

kwen2501 merged commit 0bc64c2 into main Sep 17, 2024
49 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Distributed] Enable KV cache #1154

[Distributed] Enable KV cache #1154

Uh oh!

kwen2501 commented Sep 17, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 17, 2024 •

edited

Loading

Uh oh!

Jack-Khuu Sep 17, 2024

Uh oh!

Jack-Khuu Sep 17, 2024

Uh oh!

kwen2501 Sep 17, 2024

Uh oh!

lessw2020 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Distributed] Enable KV cache #1154

[Distributed] Enable KV cache #1154

Uh oh!

Conversation

kwen2501 commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1154

❌ 2 New Failures

Uh oh!

Jack-Khuu Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

kwen2501 Sep 17, 2024

Choose a reason for hiding this comment

Uh oh!

lessw2020 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kwen2501 commented Sep 17, 2024 •

edited

Loading

pytorch-bot bot commented Sep 17, 2024 •

edited

Loading