Update on the development branch #1726
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this June 4, 2024.
This update includes:
examples/grok/README.md
.examples/llama/README.md
).qkv_bias
shape issue for Qwen1.5-32B (convert qwen 110b gptq checkpoint的时候,qkv_bias 的shape不能被3整除 #1589), thanks to the contribution from @Tlntin in fix up qkv.bias error when use qwen1.5-32b-gptq-int4 #1637.fpA_intB
, thanks to the contribution from @JamesTheZ in Fix the error of Ada traits for fpA_intB. #1583.examples/qwenvl/requirements.txt
, thanks to the contribution from @ngoanpv in Update requirements.txt #1248.lora_manager
, thanks to the contribution from @TheCodeWrangler in Fixed rslora scaling in lora_manager #1669.benchmarks/cpp/README.md
for gptManagerBenchmark seems to go into a dead loop with GPU usage 0% #1562 and Cannot process new request: [TensorRT-LLM][ERROR] Assertion failed: LoRA task 0 not found in cache. Please send LoRA weights with request (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/llm/cpp/tensorrt_llm/batch_manager/peftCacheManager.cpp:182) #1552.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions