You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-14Lines changed: 23 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,6 +19,11 @@ It exists to make it easy for researchers and engineers to **prototype**, **exte
19
19
20
20
Vortex allows you to express novel sparse attention concisely while relying on an optimized execution engine.
21
21
22
+
<videocontrolswidth="600">
23
+
<sourcesrc="assets/demov2.0.mp4"type="video/mp4">
24
+
Your browser does not support the video tag.
25
+
</video>
26
+
22
27
---
23
28
24
29
## ✨ Key Features
@@ -48,6 +53,24 @@ pip install -e .
48
53
49
54
---
50
55
56
+
## 🤖 AI-Generated Sparse Attention
57
+
58
+
Vortex is designed not only for hand-crafted sparsity patterns but also for AI-generated sparse attention.
59
+
60
+
Our demo shows how to use SOTA agents OpenHands (https://openhands.dev/) to generate sparse attention algorithms.
61
+
62
+
```bash
63
+
export LLM_API_KEY=YOUR_API_KEY
64
+
python openhands_gen.py
65
+
66
+
```
67
+
68
+
The usage and installation guide of OpenHands can be found in https://docs.openhands.dev/sdk.
69
+
70
+
Note: Some operators are not yet fused or fully optimized, which may lead to increased memory usage. Tune down the `mem_fraction_static` if CUDA OOM. This can also impact generation speed during inference.
71
+
72
+
---
73
+
51
74
## 🧩 Quick Example: Custom Sparse Attention
52
75
53
76
```python
@@ -117,20 +140,6 @@ If `vortex_module_path` is not provided, Vortex will automatically search in
117
140
118
141
---
119
142
120
-
## 🤖 AI-Generated Sparse Attention
121
-
Vortex is designed not only for hand-crafted sparsity patterns but also for AI-generated sparse attention.
122
-
123
-
Our demo shows how to use SOTA agents OpenHands (https://openhands.dev/) to generate sparse attention algorithms.
124
-
125
-
```bash
126
-
export LLM_API_KEY=YOUR_API_KEY
127
-
python openhands_gen.py
128
-
129
-
```
130
-
131
-
The usage and installation guide of OpenHands can be found in https://docs.openhands.dev/sdk.
132
-
133
-
Note: Some operators are not yet fused or fully optimized, which may lead to increased memory usage. Tune down the `mem_fraction_static` if CUDA OOM. This can also impact generation speed during inference.
0 commit comments