Feature request
Currently tgi exllama version cannot support act order with sharded GPU,
Motivation
I tested the commit:3b013cd53c7d413cf99ca04c7c28dd5c95117c0d of exllama,
command
python example_chatbot.py -d <model path> -un "Jeff" -p prompt_chatbort.txt -gs 10,12
this can loaded the model into the 2 gpu
chatbot result:

nvidia-smi result:

Your contribution
turboderp/exllama#276