upd

yzh119 · yzh119 · commit aeb48d7122b8 · 2025-03-11T06:33:49.000-07:00
diff --git a/_posts/2025-03-10-sampling.md b/_posts/2025-03-10-sampling.md
@@ -3,7 +3,7 @@ layout: post
 title:  "Sorting-Free GPU Kernels for LLM Sampling"
 date:  2025-03-10
 comments: true
-author: Shanli Xing (UW), Zihao Ye (UW), Bohan Hou (CMU), Luis Ceze (UW), Tianqi Chen (CMU)
+author: Shanli Xing (UW), Zihao Ye (UW, NVIDIA), Bohan Hou (CMU), Luis Ceze (UW, NVIDIA), Tianqi Chen (CMU, NVIDIA)
 ---
 
 ## Background
@@ -180,5 +180,14 @@ While the algorithm is elegant in theory, implementing it efficiently in a GPU k
 
 For the complete implementation details, including how we address these challenges, please refer to the [source code](https://github.com/flashinfer-ai/flashinfer/blob/main/include/flashinfer/sampling.cuh).
 
+## Acknowledgement
+
+This blog is written by [Shanli Xing](https://xsl.ing/), we thank the flashinfer team contribute to the flashinfer.sampling module:
+* Zihao Ye: implementation of sampling kernels in CUDA.
+* Bohan Hou: implementation of sampling kernels in TVM.
+* Shanli Xing: implementation of min-p sampling kernels in CUDA.
+* Tianqi Chen: propose the idea of rejection sampling for top-p.
+
 ## Footnotes
-[^1]: FlashInfer provides both "Top-K First" and "Joint" filtering options, with the latter applying Top-K and Top-P simultaneously at each round. More on the [doc](https://docs.flashinfer.ai/generated/flashinfer.sampling.top_k_top_p_sampling_from_probs.html).
+[^1]: FlashInfer provides both "Top-K First" and "Joint" filtering options, with the latter applying Top-K and Top-P simultaneously at each round. More on the [doc](https://docs.flashinfer.ai/generated/flashinfer.sampling.top_k_top_p_sampling_from_probs.html).
+