Commit e0b124a
Add ability for force bos id for mbart (#22)
* Merge with main (#1)
* Update beam_search_topk_kernels.cu
fix: fix bug of beam search
* fix: change int of some kernels to int64_t to prevent overflow
* fix: gpt tensor shapes inconsistency (NVIDIA#505)
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* Update gpt_guide.md (NVIDIA#529)
* fix: fix bug of gpt buffer and gpt gemm overflow
* Update T5DecodingWeight.cc
fix: fix loading bug of t5
* [Enhancement]add pytorch backend support for gptneox (NVIDIA#550)
* add pytorch backend support for gptneox
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* fix early stopping invalid
* 1) Some unused parameters and logic have been removed. 2) Revisions that would affect pipeline parallelism have been reverted. 3) The code has been made capable of direct validation on TabbyML/NeoX-1.3B.
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* Change the names of classes, removing 'parallel' from their names
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* Format the code.
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* Only print results when rank is 0.
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* Add dist.init_process_group().
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* update docs
Signed-off-by: AkiyamaYummy <842720660@qq.com>
---------
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* Update cublasMMWrapper.cc
Fix the CUBLAS_VERSION checking of cublasMMWrapper
* Update cublasMMWrapper.cc
* fix overflow in softmax_kernel when process long seqlen and big batch_size (NVIDIA#524)
* Update unfused_attention_kernels.cu
fix bug of softmax kernel
* [Enhancement]create huggingface_gptneox_convert.py (NVIDIA#569)
* create huggingface_gptneox_convert.py
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* adjust HF's multi bin files
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* update gptneox_guide.md
Signed-off-by: AkiyamaYummy <842720660@qq.com>
---------
Signed-off-by: AkiyamaYummy <842720660@qq.com>
* perf(bloom): improve performance of huggingface_bloom_convert.py, decrease the time cost and the mem using (NVIDIA#568)
Co-authored-by: r.yang <r.yang@tianrang-inc.com>
* Fix/gpt early stop (NVIDIA#584)
* fix: fix bug of early stopping of gpt
* [bugfix] Fix 2-shot All Reduce correctness issue (indexing bug). (NVIDIA#672)
FasterTransformer 2-shot all reduce is implemented as a reduce-scatter + all-gather. There is an indexing bug in the all-gather step. Prior to this change, 2-shot all reduce was only producing correct results on device 0. Now, all devices have the correct results.
* fix: swap tensor bug (NVIDIA#683)
* Support size_per_head=112 (NVIDIA#660)
* fix multi-gpu build
* add support for size_per_head=112 for gpt decoder
* remove mpi_cxx from multi-gpu build for now (NVIDIA#705)
---------
Signed-off-by: AkiyamaYummy <842720660@qq.com>
Co-authored-by: byshiue <bhsueh@nvidia.com>
Co-authored-by: _yummy_ <842720660@qq.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>
Co-authored-by: 杨睿 <595403043@qq.com>
Co-authored-by: r.yang <r.yang@tianrang-inc.com>
Co-authored-by: Rahul Kindi <rkindi@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Daya Khudia <37562707+dskhudia@users.noreply.github.com>
Co-authored-by: Dean Wyatte <2512762+dwyatte@users.noreply.github.com>
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
* commit
---------
Signed-off-by: AkiyamaYummy <842720660@qq.com>
Co-authored-by: Asim Shankar <asim.shankar@snowflake.com>
Co-authored-by: byshiue <bhsueh@nvidia.com>
Co-authored-by: _yummy_ <842720660@qq.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>
Co-authored-by: 杨睿 <595403043@qq.com>
Co-authored-by: r.yang <r.yang@tianrang-inc.com>
Co-authored-by: Rahul Kindi <rkindi@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Daya Khudia <37562707+dskhudia@users.noreply.github.com>
Co-authored-by: Dean Wyatte <2512762+dwyatte@users.noreply.github.com>1 parent 3336e68 commit e0b124a
File tree
4 files changed
+68
-2
lines changed- src/fastertransformer
- kernels
- models/bart
4 files changed
+68
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
67 | 95 | | |
68 | 96 | | |
69 | 97 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
36 | 43 | | |
37 | 44 | | |
38 | 45 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
| 119 | + | |
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
| |||
182 | 183 | | |
183 | 184 | | |
184 | 185 | | |
| 186 | + | |
185 | 187 | | |
186 | 188 | | |
187 | 189 | | |
| |||
343 | 345 | | |
344 | 346 | | |
345 | 347 | | |
| 348 | + | |
346 | 349 | | |
347 | 350 | | |
348 | 351 | | |
| |||
382 | 385 | | |
383 | 386 | | |
384 | 387 | | |
| 388 | + | |
385 | 389 | | |
386 | 390 | | |
387 | 391 | | |
| |||
792 | 796 | | |
793 | 797 | | |
794 | 798 | | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
795 | 825 | | |
796 | 826 | | |
797 | 827 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
97 | | - | |
98 | | - | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
99 | 100 | | |
100 | 101 | | |
101 | 102 | | |
| |||
0 commit comments