Skip to content

Scale testing for the EPPΒ #1123

@kfswain

Description

@kfswain

With the llm-d sim server available, high scale testing of Inference Gateway is possible. Before GA, it would be good to know some upper limits of the default EPP.

Dimensions of test:

  • prompt length
  • raw QPS
  • a sample of the Pareto frontier of those two dimensions

Note: Also record the Resources Limits configured for the EPP container

Metadata

Metadata

Labels

triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions