generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
Hello-
I've been looking into hosting an LLM on AWS Infrastructure. I am mainly looking to host Flan T5 XXL. My question is below
Inquiry: what is the recommended container for hosting Flan T5 XXL?
Context: I've hosted Flan T5 XXL using the TGI Container and the DJL-FasterTransformer container. Using the same Prompt, TGI takes around 5-6 seconds whereas the DJL-FasterTransformer container takes .5-1.5 seconds. The DJL-FasterTransformer Container has the tensor-parallel-degree set to 4. The SM_NM_GPU for TGI was set to 4. Both were hosted using ml.g5.12xlarge.
- Are there recommended configs for the TGI Container that I might be missing?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels