Commit e4b4dc2
authored
[Kernels] Replace CPU Function Calls with GPU Kernel Invocations (#697)
1. Replace CPU function calls for the following tasks with GPU kernel
invocations:
- Apply logit bias
- Apply penalties to logits
- Compute softmax with temperature (sampling will be replaced in a
future PR)
2. Fixed bug with repetition penalty not being used in generation config
- Added repetition penalty to CompletionCreateParamsBase and
ChatCompletionRequestBase interfaces
- Updated definition in GenerationConfig and added reference in
engine.ts
3. Added additional field in CompletionCreateParamsBase and
ChatCompletionRequestBase interfaces to enable logging of time taken for
individual steps
4. Added sanity checks for individual steps in sampleTokenFromLogits
Performance Comparison: Compared performance for "canonical" flows
averaged across 20 runs
- No logit_bias
- No logitProcessor
- Applied penalties
- With and without logprobs
1. Before PR performance (without logprobs): ~0.064s per output token
(~15.63 decode tokens/s)
2. After PR performance (without logprobs): ~0.066s per output token
(~15.15 decode tokens/s)
3. Before PR performance (with logprobs): ~0.052s per output token
(~19.23 decode tokens/s)
5. After PR performance (without logprobs): ~0.048s per output token
(~20.83 decode tokens/s)
Additional Notes:
- Need to profile performance of sampleTopPFromLogits vs
sampleTopPFromProb on CPU to determine why performance with logprobs is
better
- Application of logit_bias is much faster on GPU than CPU
- There are additional overheads outside of the sampleTokenFromLogits
function that make the performance improvement less pronounced (the
total time spent in sampleTokenFromLogits is ~0.0117s before the PR and
~0.0076s after the PR)1 parent d8b25fe commit e4b4dc2
File tree
16 files changed
+752
-75
lines changed- examples/get-started-latency-breakdown
- src
- src
- openai_api_protocols
- tests
- scripts/sanity_checks
16 files changed
+752
-75
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
324 | 324 | | |
325 | 325 | | |
326 | 326 | | |
327 | | - | |
328 | 327 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
Lines changed: 23 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
Lines changed: 135 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
| 129 | + | |
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
| |||
143 | 143 | | |
144 | 144 | | |
145 | 145 | | |
| 146 | + | |
146 | 147 | | |
147 | 148 | | |
148 | 149 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| 44 | + | |
44 | 45 | | |
45 | 46 | | |
46 | 47 | | |
| |||
694 | 695 | | |
695 | 696 | | |
696 | 697 | | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
697 | 701 | | |
698 | 702 | | |
699 | 703 | | |
700 | 704 | | |
701 | 705 | | |
702 | 706 | | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
703 | 710 | | |
704 | 711 | | |
705 | 712 | | |
| |||
783 | 790 | | |
784 | 791 | | |
785 | 792 | | |
| 793 | + | |
786 | 794 | | |
787 | 795 | | |
788 | 796 | | |
| |||
793 | 801 | | |
794 | 802 | | |
795 | 803 | | |
| 804 | + | |
796 | 805 | | |
797 | 806 | | |
798 | 807 | | |
| |||
890 | 899 | | |
891 | 900 | | |
892 | 901 | | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
893 | 906 | | |
894 | 907 | | |
895 | 908 | | |
896 | 909 | | |
897 | 910 | | |
898 | 911 | | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
899 | 915 | | |
900 | 916 | | |
901 | 917 | | |
| |||
958 | 974 | | |
959 | 975 | | |
960 | 976 | | |
| 977 | + | |
961 | 978 | | |
962 | 979 | | |
963 | 980 | | |
| |||
1030 | 1047 | | |
1031 | 1048 | | |
1032 | 1049 | | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
1033 | 1053 | | |
1034 | 1054 | | |
1035 | 1055 | | |
| |||
1046 | 1066 | | |
1047 | 1067 | | |
1048 | 1068 | | |
| 1069 | + | |
| 1070 | + | |
| 1071 | + | |
1049 | 1072 | | |
1050 | 1073 | | |
1051 | 1074 | | |
| |||
0 commit comments