Skip to content

Commit 09244c2

Browse files
deploy: be3df77
1 parent 178e105 commit 09244c2

File tree

9 files changed

+145
-280
lines changed

9 files changed

+145
-280
lines changed

_modules/vortex_torch/flow/algorithms.html

Lines changed: 32 additions & 131 deletions
Large diffs are not rendered by default.

_modules/vortex_torch/flow/flow.html

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -339,18 +339,20 @@ <h1>Source code for vortex_torch.flow.flow</h1><div class="highlight"><pre>
339339
<span class="sd"> .. math::</span>
340340

341341
<span class="sd"> \text{cache[key]} \sim</span>
342-
<span class="sd"> \mathbb{R}^{S_{\text{pack}} \times r \times c},</span>
342+
<span class="sd"> \mathbb{R}^{S \times r \times c},</span>
343343

344-
<span class="sd"> where</span>
344+
<span class="sd"> </span>
345345

346-
<span class="sd"> .. math::</span>
347-
348-
<span class="sd"> S_{\text{pack}} = \sum_{i=0}^{B-1} S_i</span>
349-
350-
<span class="sd"> is the total number of pages packed across all requests, and</span>
351346
<span class="sd"> :math:`(r, c)` is the per-key inner shape declared via</span>
352347
<span class="sd"> :meth:`create_cache` or implicitly for ``&quot;k&quot;``/``&quot;v&quot;``.</span>
353348

349+
<span class="sd"> Here :math:`S` is the leading page axis. Internally it is a packed</span>
350+
<span class="sd"> axis (often denoted :math:`S_{\mathrm{pack}}`), obtained by</span>
351+
<span class="sd"> concatenating the pages from all requests. As a user, you can simply</span>
352+
<span class="sd"> think of :math:`S` as &quot;the number of pages for this request&quot;; the</span>
353+
<span class="sd"> vFlow kernels and :class:`ContextBase` will take care of mapping</span>
354+
<span class="sd"> between per-request page counts and the packed layout automatically.</span>
355+
<span class="sd"> </span>
354356
<span class="sd"> 2. **Cache-update view (batch-major)** — used in :meth:`forward_cache`:</span>
355357

356358
<span class="sd"> .. math::</span>
@@ -398,7 +400,7 @@ <h1>Source code for vortex_torch.flow.flow</h1><div class="highlight"><pre>
398400
<span class="sd"> {\text{page_size} \cdot \text{head_dim}}.</span>
399401

400402
<span class="sd"> This ignores the leading dimension (whether :math:`B` or</span>
401-
<span class="sd"> :math:`S_{\text{pack}}`) and compares only inner shapes to the</span>
403+
<span class="sd"> :math:`S`) and compares only inner shapes to the</span>
402404
<span class="sd"> baseline ``(page_size, head_dim)``.</span>
403405

404406
<span class="sd"> Subclass responsibilities</span>
@@ -407,7 +409,7 @@ <h1>Source code for vortex_torch.flow.flow</h1><div class="highlight"><pre>
407409

408410
<span class="sd"> - :meth:`forward_indexer(q, o, cache, ctx)`:</span>
409411
<span class="sd"> compute sparse page indices (or routing scores) from queries,</span>
410-
<span class="sd"> using cache in the :math:`S_{\text{pack}}` view.</span>
412+
<span class="sd"> using cache in the :math:`S` view.</span>
411413

412414
<span class="sd"> - :meth:`forward_cache(cache, loc, ctx)`:</span>
413415
<span class="sd"> update cache tensors using the :math:`B`-major view and positional</span>
@@ -463,9 +465,8 @@ <h1>Source code for vortex_torch.flow.flow</h1><div class="highlight"><pre>
463465
<span class="sd"> .. math::</span>
464466

465467
<span class="sd"> \text{cache[key]}</span>
466-
<span class="sd"> \sim \mathbb{R}^{S_{\text{pack}} \times r \times c},</span>
468+
<span class="sd"> \sim \mathbb{R}^{S \times r \times c},</span>
467469

468-
<span class="sd"> where :math:`S_{\text{pack}} = \sum_i S_i` and</span>
469470
<span class="sd"> :math:`(r, c)` are the per-key inner dimensions obtained from</span>
470471
<span class="sd"> :meth:`get_cache_meta_info`.</span>
471472

@@ -479,7 +480,7 @@ <h1>Source code for vortex_torch.flow.flow</h1><div class="highlight"><pre>
479480
<span class="sd"> --------</span>
480481
<span class="sd"> Implementations should:</span>
481482

482-
<span class="sd"> - interpret ``cache`` in the :math:`S_{\text{pack}}` view,</span>
483+
<span class="sd"> - interpret ``cache`` in the :math:`S` view,</span>
483484
<span class="sd"> - use ``q`` and relevant cache tensors to score/select pages,</span>
484485
<span class="sd"> - respect per-request bounds derived from ``ctx``,</span>
485486
<span class="sd"> - write the resulting sparse indices (or routing representation)</span>
@@ -557,7 +558,7 @@ <h1>Source code for vortex_torch.flow.flow</h1><div class="highlight"><pre>
557558

558559
<span class="sd"> This method **does not allocate** tensors. It only declares the</span>
559560
<span class="sd"> per-key inner dimensions :math:`(r, c)`; the runtime will attach</span>
560-
<span class="sd"> the appropriate leading axis (:math:`B` or :math:`S_{\text{pack}}`)</span>
561+
<span class="sd"> the appropriate leading axis (:math:`B` or :math:`S`)</span>
561562
<span class="sd"> depending on whether the cache is used in :meth:`forward_cache`</span>
562563
<span class="sd"> or :meth:`forward_indexer`.</span>
563564

@@ -626,7 +627,7 @@ <h1>Source code for vortex_torch.flow.flow</h1><div class="highlight"><pre>
626627
<span class="sd"> Dict[str, Tuple[int, int]]</span>
627628
<span class="sd"> Mapping from cache tensor names to inner shapes ``(r, c)``.</span>
628629
<span class="sd"> The runtime will later prepend either a batch axis ``B`` or a</span>
629-
<span class="sd"> packed-page axis ``S_pack`` when materializing the tensors.</span>
630+
<span class="sd"> packed-page axis ``S`` when materializing the tensors.</span>
630631

631632
<span class="sd"> Raises</span>
632633
<span class="sd"> ------</span>
@@ -664,7 +665,7 @@ <h1>Source code for vortex_torch.flow.flow</h1><div class="highlight"><pre>
664665
<span class="sd"> \frac{r_{\text{key}} \cdot c_{\text{key}}}</span>
665666
<span class="sd"> {\text{page_size} \cdot \text{head_dim}}.</span>
666667

667-
<span class="sd"> The leading dimension (:math:`B` or :math:`S_{\text{pack}}`) is</span>
668+
<span class="sd"> The leading dimension (:math:`B` or :math:`S`) is</span>
668669
<span class="sd"> not included in this ratio on purpose; it is a per-page</span>
669670
<span class="sd"> normalization.</span>
670671

_sources/index.rst.txt

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,33 @@ Installation
88

99
.. code-block:: bash
1010
11-
pip install vortex-torch
11+
git clone https://github.com/Infini-AI-Lab/vortex_torch.git
12+
cd vortex_torch
13+
pip install -e .
1214
1315
Quick Example
1416
-------------
15-
1617
.. code-block:: python
1718
18-
import vortex_torch as vt
19+
20+
21+
.. code-block:: python
1922
20-
model = vt.Model(...)
21-
out = model.forward(...)
23+
llm = sgl.Engine(model_path="Qwen/Qwen3-0.6B",
24+
disable_cuda_graph=False,
25+
page_size=16,
26+
vortex_topk_val=30,
27+
disable_overlap_schedule=True,
28+
attention_backend="flashinfer",
29+
enable_vortex_sparsity=True,
30+
vortex_page_reserved_bos=1,
31+
vortex_page_reserved_eos=1,
32+
vortex_layers_skip=list(range(1)),
33+
vortex_module_path="path/to/custom_sparse_attention.py"
34+
vortex_module_name="custom_sparse_attention",
35+
vortex_max_seq_lens=8192,
36+
mem_fraction_static=0.6
37+
)
2238
2339
API Reference
2440
-------------

0 commit comments

Comments
 (0)