Skip to content

Adapt to the latest version of exllama or exllamav2? #2

@oat-mirror

Description

@oat-mirror

Feature request

Currently tgi exllama version cannot support act order with sharded GPU,

Motivation

I tested the commit:3b013cd53c7d413cf99ca04c7c28dd5c95117c0d of exllama,

command

python example_chatbot.py -d <model path> -un "Jeff" -p prompt_chatbort.txt  -gs 10,12

this can loaded the model into the 2 gpu

chatbot result:
image

nvidia-smi result:
image

Your contribution

turboderp/exllama#276

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions