Server: Add prompt processing progress endpoint?

**Note: This issue was copied from [https://github.com/ggml-org/llama.cpp/issues/6586](https://github.com/ggml-org/llama.cpp/issues/6586)**

**Original Author:** @stduhpf
**Original Issue Number:** #6586
**Created:** 2024-04-10T11:35:31Z

---

# Feature Description

It would be nice to have an endpoint on the server example to fetch information about the progress of an ongoing prompt processing It could return something like this:
```json
{
    "processing": [true|false]
    "prompt_length": [number of uncached tokens of the last prompt]
    "remaining": [number of tokens yet to be processed]
}
```

# Motivation

For longer prompts, or when the processing speed is very slow, it would be nice to get a clue about the advencement of the prompt processing. This would possibly also be useful for other projects, not just the server.

# Possible Implementation

I haven't yet looked too deep in the current server implementation, so I can't really tell how this would work, but I imagine it would require some deeper changes in the backend too. 
I did add a simillar feature on a very old project based on an ancient version of llama.cpp, a year ago: https://github.com/stduhpf/fastLLaMa/commit/1ebd5ba79b3a7e4461166fe8683b366ce77a8933 This is now very much outdated, but this feature was nice to have.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Server: Add prompt processing progress endpoint? #274

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Server: Add prompt processing progress endpoint? #274

Description

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions