You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now we need to retrieve the value data and perform dot multiplication
@@ -499,3 +499,14 @@ for (int i = 0; i < NUM_ROWS_PER_THREAD; i++) {
499
499
Finally, we need to iterate over different assigned head positions
500
500
and write out the corresponding accumulated result based on the
501
501
`out_ptr`.
502
+
503
+
## Citation
504
+
505
+
```bibtex
506
+
@inproceedings{kwon2023efficient,
507
+
title={Efficient Memory Management for Large Language Model Serving with PagedAttention},
508
+
author={Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica},
509
+
booktitle={Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles},
Copy file name to clipboardExpand all lines: docs/design/plugin_system.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# vLLM's Plugin System
1
+
# Plugin System
2
2
3
3
The community frequently requests the ability to extend vLLM with custom features. To facilitate this, vLLM includes a plugin system that allows users to add custom features without modifying the vLLM codebase. This document explains how plugins work in vLLM and how to create a plugin for vLLM.
Copy file name to clipboardExpand all lines: docs/design/torch_compile.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# vLLM's `torch.compile` integration
1
+
# `torch.compile` integration
2
2
3
3
In vLLM's V1 architecture, `torch.compile` is enabled by default and is a critical part of the framework. This document gives a simple walk-through example to show how to understand the `torch.compile` usage.
0 commit comments