Skip to content

Commit 8ce7e39

Browse files
finish (#1499)
1 parent 4cdbf96 commit 8ce7e39

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

optimize-llm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ As LLMs improve in text comprehension and generation, they are applied to increa
314314

315315
How can we get rid of the exorbitant memory requirements for large input lengths? We need a new way to compute the self-attention mechanism that gets rid of the \\( QK^T )\\ matrix. [Tri Dao et al.](https://arxiv.org/abs/2205.14135) developed exactly such a new algorithm and called it **Flash Attention**.
316316

317-
In a nutshell, Flash Attention breaks the \\( \mathbf{V} \times \text{Softmax}(\mathbf{QK}^T) )\\ computation apart and instead computes smaller chunks of the output by iterating oven multiple softmax computation steps:
317+
In a nutshell, Flash Attention breaks the \\(\mathbf{V} \times \text{Softmax}(\mathbf{QK}^T))\\ computation apart and instead computes smaller chunks of the output by iterating oven multiple softmax computation steps:
318318

319319
$$ \textbf{O}_i \leftarrow s^a_{ij} * \textbf{O}_i + s^b_{ij} * \mathbf{V}_{j} \times \text{Softmax}(\mathbf{QK}^T_{i,j}) \text{ for multiple } i, j \text{ iterations} $$
320320

0 commit comments

Comments
 (0)