Skip to content

Commit d7997c8

Browse files
authored
Merge pull request #14 from aarnphm/patch-1
Remove invalid links for references
2 parents f9a15b5 + 793b30c commit d7997c8

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2025-01-14-struct-decode-intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ There are still a few usability concerns in XGrammar v0 integration to match fea
109109

110110
With the release of [v1](https://github.com/vllm-project/vllm/issues/8779) on the horizon, we're working on a tentative plan for structured decoding:
111111

112-
1. Moving guided decoding towards scheduler-level [\[10\]](https://www.notion.so/Blog-4X-structured-decoding-speed-in-vLLM-8c3f2d44f6504202abbdb534983f2b2e?pvs=21)
112+
1. Moving guided decoding towards scheduler-level:
113113
- Reason: We have more context regarding which requests that use structured decoding at a scheduler-level, therefore it shouldn't block other requests within the batch (tentatively addressing **limitation (2)**). In a sense, this moves guided decoding outside of the critical path.
114114
- This would allow for more natural vertical integration with jump-forward decoding (address **limitation (4)**).
115115
2. Allowing bit-mask calculation in one process instead of each GPU workers

0 commit comments

Comments
 (0)