Skip to content

Commit aeb48d7

Browse files
committed
upd
1 parent f5d807b commit aeb48d7

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

_posts/2025-03-10-sampling.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ layout: post
33
title: "Sorting-Free GPU Kernels for LLM Sampling"
44
date: 2025-03-10
55
comments: true
6-
author: Shanli Xing (UW), Zihao Ye (UW), Bohan Hou (CMU), Luis Ceze (UW), Tianqi Chen (CMU)
6+
author: Shanli Xing (UW), Zihao Ye (UW, NVIDIA), Bohan Hou (CMU), Luis Ceze (UW, NVIDIA), Tianqi Chen (CMU, NVIDIA)
77
---
88

99
## Background
@@ -180,5 +180,14 @@ While the algorithm is elegant in theory, implementing it efficiently in a GPU k
180180

181181
For the complete implementation details, including how we address these challenges, please refer to the [source code](https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/sampling.cuh).
182182

183+
## Acknowledgement
184+
185+
This blog is written by [Shanli Xing](https://xsl.ing/), we thank the flashinfer team contribute to the flashinfer.sampling module:
186+
* Zihao Ye: implementation of sampling kernels in CUDA.
187+
* Bohan Hou: implementation of sampling kernels in TVM.
188+
* Shanli Xing: implementation of min-p sampling kernels in CUDA.
189+
* Tianqi Chen: propose the idea of rejection sampling for top-p.
190+
183191
## Footnotes
184-
[^1]: FlashInfer provides both "Top-K First" and "Joint" filtering options, with the latter applying Top-K and Top-P simultaneously at each round. More on the [doc](https://docs.flashinfer.ai/generated/flashinfer.sampling.top_k_top_p_sampling_from_probs.html).
192+
[^1]: FlashInfer provides both "Top-K First" and "Joint" filtering options, with the latter applying Top-K and Top-P simultaneously at each round. More on the [doc](https://docs.flashinfer.ai/generated/flashinfer.sampling.top_k_top_p_sampling_from_probs.html).
193+

0 commit comments

Comments
 (0)