-
Couldn't load subscription status.
- Fork 37
Description
What would you like to be added:
Log probabilities support for llm-d-inference-sim.
llm-d-inference-sim currently lacks support for log probabilities (logprobs) in completion responses, which limits its usability in integration testing scenarios with evaluation frameworks and tools that require this API feature to be present.
Why is this needed:
- Model Evaluations
As an example, The lm-evaluation-harness framework uses logprobs extensively for benchmark tasks:
- Multiple-choice tasks (MMLU, HellaSwag, ARC, etc.): These tasks require logprobs to calculate which answer choice has the highest likelihood. Without logprobs, these evaluations cannot run.
- Perplexity measurements: Many benchmarks compute perplexity by summing log probabilities across tokens.
- Loglikelihood scoring: Tasks need to compare the probability of different continuations to select the most likely one.
- API Coverage
vLLM's OpenAI-compatible API includes logprobs support as specified in the OpenAI API specification:
Text completions: /v1/completions endpoint supports logprobs parameter (integer 0-5)
Chat completions: /v1/chat/completions endpoint supports logprobs boolean and top_logprobs integer parameters
API Format:
Text Completion Request (OpenAI format)
{
"model": "Qwen/Qwen2-0.5B",
"prompt": "The capital of France is",
"max_tokens": 5,
"logprobs": 2
}Text Completion Response (with logprobs)
{
"id": "cmpl-abc123",
"object": "text_completion",
"created": 1234567890,
"model": "Qwen/Qwen2-0.5B",
"choices": [
{
"text": " Paris",
"index": 0,
"logprobs": {
"tokens": [" Paris"],
"token_logprobs": [-0.15],
"top_logprobs": [
{
" Paris": -0.15,
" paris": -2.8
}
],
"text_offset": [0]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 1,
"total_tokens": 6
}
}Chat Completion Request (OpenAI format)
{
"model": "Qwen/Qwen2-0.5B",
"messages": [
{"role": "user", "content": "What is 2+2?"}
],
"logprobs": true,
"top_logprobs": 3
}Chat Completion Response (with logprobs)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1234567890,
"model": "Qwen/Qwen2-0.5B",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "4"
},
"logprobs": {
"content": [
{
"token": "4",
"logprob": -0.08,
"bytes": [52],
"top_logprobs": [
{"token": "4", "logprob": -0.08, "bytes": [52]},
{"token": "four", "logprob": -3.2, "bytes": [102, 111, 117, 114]},
{"token": "Four", "logprob": -4.1, "bytes": [70, 111, 117, 114]}
]
}
]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 1,
"total_tokens": 13
}
}The generated logprobs are synthetic and not mathematically valid. They are provided solely to enable llmd-sim to work with evaluation frameworks and tools that require logprobs in the API response. The focus is on API compatibility and integration testing, not on generating accurate probability distributions.