Skip to content

Latency and Throughput Inquiry #20

@ctandrewtran

Description

@ctandrewtran

Hello-

I've been looking into hosting an LLM on AWS Infrastructure. I am mainly looking to host Flan T5 XXL. My question is below

Inquiry: what is the recommended container for hosting Flan T5 XXL?
Context: I've hosted Flan T5 XXL using the TGI Container and the DJL-FasterTransformer container. Using the same Prompt, TGI takes around 5-6 seconds whereas the DJL-FasterTransformer container takes .5-1.5 seconds. The DJL-FasterTransformer Container has the tensor-parallel-degree set to 4. The SM_NM_GPU for TGI was set to 4. Both were hosted using ml.g5.12xlarge.

  • Are there recommended configs for the TGI Container that I might be missing?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions