后续Chinese-LLama或者LLama2上会考虑用上bettertransformer吗？ #791

alkaideemo started this conversation in Ideas

alkaideemo
Jul 27, 2023

optimum在1.9.0版本后支持了llama：https://github.com/huggingface/optimum/pull/998，这样transformers版本的llama也能用scaled_dot_product_attention这个新算子，硬件支持可以调用到flash attention的kernel。
简单测试了一下，在8卡A100上训练吞吐几乎可以增加一倍，显存占用也降低了。

Replies: 1 comment

airaria
Jul 31, 2023

我们刚发布了Chinese-LLaMA-2，使用了FlashAttention-2技术进行训练。
并且推理脚本支持memory efficient attention。

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment