-
Notifications
You must be signed in to change notification settings - Fork 87
Open
Labels
enhancementNew feature or requestNew feature or requestinternalfiled by core contributor or associatefiled by core contributor or associate
Milestone
Description
Is your feature request related to a problem? Please describe.
- When generating a long prompt like the below:
guidellm benchmark --target=$URL --model=$MODEL --rate-type=concurrent --rate=50 --max-requests=50 --output-path=~/guidellm_results.json --processor=$MODEL --data='{"prompt_tokens":8000, "output_tokens":5000}'
We get stuck generating the random data for about 1 minute:
(app-root) bash-5.1$ guidellm benchmark --target=$URL --model=$MODEL --rate-type=concurrent --rate=50 --max-requests=50 --output-path=~/guidellm_results.json --processor=$MODEL --data='{"prompt_tokens":8000, "output_tokens":5000}'
Creating backend...
Backend openai_http connected to http://10.16.1.195:8000 for model
meta-llama/Meta-Llama-3-8B-Instruct.
Creating request loader...
Describe the solution you'd like
There should be some feedback during this period about what is happening. I thought there was a bug in vllm or in guidellm during this period
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
ivanbaldo
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestinternalfiled by core contributor or associatefiled by core contributor or associate