后续Chinese-LLama或者LLama2上会考虑用上bettertransformer吗? #791
alkaideemo
started this conversation in
Ideas
Replies: 1 comment
-
|
我们刚发布了Chinese-LLaMA-2,使用了FlashAttention-2技术进行训练。 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
optimum在1.9.0版本后支持了llama:https://github.com/huggingface/optimum/pull/998,这样transformers版本的llama也能用scaled_dot_product_attention这个新算子,硬件支持可以调用到flash attention的kernel。
简单测试了一下,在8卡A100上训练吞吐几乎可以增加一倍,显存占用也降低了。
Beta Was this translation helpful? Give feedback.
All reactions