forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededserver/webui
Description
Note: This issue was copied from ggml-org#6607
Original Author: @phymbert
Original Issue Number: ggml-org#6607
Created: 2024-04-11T10:23:54Z
Context
At the moment we implement a FIFO approach to batch prompt tokens. So if a large prompt is to be processed it blocks all other slots.
Proposal: implement a fair batch usage of prompt processing accross all pending slots.
References:
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededserver/webui