generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 182
Closed
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Milestone
Description
With the llm-d sim server available, high scale testing of Inference Gateway is possible. Before GA, it would be good to know some upper limits of the default EPP.
Dimensions of test:
- prompt length
- raw QPS
- a sample of the Pareto frontier of those two dimensions
Note: Also record the Resources Limits configured for the EPP container
shaneuttshaneuttshaneuttshaneuttshaneutt
Metadata
Metadata
Assignees
Labels
triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.