Skip to content

Commit 84a89ae

Browse files
Update index.html
1 parent 0fc29b7 commit 84a89ae

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -566,7 +566,7 @@ <h2 class="title is-3" style="text-align: center;">
566566
</h2>
567567
<div class="content has-text-justified">
568568
<p>
569-
In this post, we introduced the Kinetics Scaling Law, emphasizing that attention cost, not parameter count, is the dominant factor at test time, fundamentally reshaping the traditional scaling landscape. We further demonstrated that sparse attention is crucial for achieving more effective and scalable test-time scaling. While our discussion focused on a simple sparse attention algorithm like block top-k attention, we anticipate that more advanced algorithms will approach or even outperform oracle top-k scaling. Moreover, sparse attention drastically reduces inference cost, enabling more reasoning trials and longer generations. This unlocks greater flexibility in configuring TTS strategies within a fixed resource. Overall, we believe the Kinetics Scaling Law serves as a guiding principle for end-to-end design in agent deployment, model architectures, LLM serving systems, and hardware.
569+
In this post, we introduced the Kinetics Scaling Law, emphasizing that attention cost, not parameter count, is the dominant factor at test time, fundamentally reshaping the previous scaling law. We further demonstrated that sparse attention is crucial for achieving more effective and scalable test-time scaling. While our discussion focused on a simple sparse attention algorithm, block top-k attention, we anticipate that more advanced algorithms will approach or even outperform oracle top-k scaling. Moreover, sparse attention drastically reduces inference cost, enabling more reasoning trials and longer generations. This unlocks greater flexibility in configuring TTS strategies within a fixed resource. This work aims to contribute to the understanding of efficiency and scalability challenges in the test-time scaling era, spanning model architecture, system-level implementation, and hardware design. We highlight the central role of sparsity in addressing these challenges.
570570
</p>
571571
</div>
572572
<div class="has-text-centered">

0 commit comments

Comments
 (0)