feat: Log probabilities support

**What would you like to be added**:

Log probabilities support for llm-d-inference-sim.

llm-d-inference-sim currently lacks support for log probabilities (logprobs) in completion responses, which limits its usability in integration testing scenarios with evaluation frameworks and tools that require this API feature to be present.


**Why is this needed**:

1. Model Evaluations 

As an example, The lm-evaluation-harness framework uses logprobs extensively for benchmark tasks:

* Multiple-choice tasks (MMLU, HellaSwag, ARC, etc.): These tasks require logprobs to calculate which answer choice has the highest likelihood. Without logprobs, these evaluations cannot run.
* Perplexity measurements: Many benchmarks compute perplexity by summing log probabilities across tokens.
* Loglikelihood scoring: Tasks need to compare the probability of different continuations to select the most likely one.

2. API Coverage
vLLM's OpenAI-compatible API includes logprobs support as specified in the OpenAI API specification:

Text completions: `/v1/completions` endpoint supports logprobs parameter (integer 0-5)
Chat completions: `/v1/chat/completions` endpoint supports `logprobs` boolean and `top_logprobs` integer parameters

API Format:

Text Completion Request (OpenAI format)
```json
{
  "model": "Qwen/Qwen2-0.5B",
  "prompt": "The capital of France is",
  "max_tokens": 5,
  "logprobs": 2
}
```

Text Completion Response (with logprobs)

```json
{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1234567890,
  "model": "Qwen/Qwen2-0.5B",
  "choices": [
    {
      "text": " Paris",
      "index": 0,
      "logprobs": {
        "tokens": [" Paris"],
        "token_logprobs": [-0.15],
        "top_logprobs": [
          {
            " Paris": -0.15,
            " paris": -2.8
          }
        ],
        "text_offset": [0]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 1,
    "total_tokens": 6
  }
}
```

Chat Completion Request (OpenAI format)

```json
{
  "model": "Qwen/Qwen2-0.5B",
  "messages": [
    {"role": "user", "content": "What is 2+2?"}
  ],
  "logprobs": true,
  "top_logprobs": 3
}
```

Chat Completion Response (with logprobs)

```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "Qwen/Qwen2-0.5B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "4"
      },
      "logprobs": {
        "content": [
          {
            "token": "4",
            "logprob": -0.08,
            "bytes": [52],
            "top_logprobs": [
              {"token": "4", "logprob": -0.08, "bytes": [52]},
              {"token": "four", "logprob": -3.2, "bytes": [102, 111, 117, 114]},
              {"token": "Four", "logprob": -4.1, "bytes": [70, 111, 117, 114]}
            ]
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 1,
    "total_tokens": 13
  }
}
```

The generated logprobs are synthetic and not mathematically valid. They are provided solely to enable llmd-sim to work with evaluation frameworks and tools that require logprobs in the API response. The focus is on API compatibility and integration testing, not on generating accurate probability distributions.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: Log probabilities support #213

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

feat: Log probabilities support #213

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions