Skip to content

feat: Log probabilities support #213

@ruivieira

Description

@ruivieira

What would you like to be added:

Log probabilities support for llm-d-inference-sim.

llm-d-inference-sim currently lacks support for log probabilities (logprobs) in completion responses, which limits its usability in integration testing scenarios with evaluation frameworks and tools that require this API feature to be present.

Why is this needed:

  1. Model Evaluations

As an example, The lm-evaluation-harness framework uses logprobs extensively for benchmark tasks:

  • Multiple-choice tasks (MMLU, HellaSwag, ARC, etc.): These tasks require logprobs to calculate which answer choice has the highest likelihood. Without logprobs, these evaluations cannot run.
  • Perplexity measurements: Many benchmarks compute perplexity by summing log probabilities across tokens.
  • Loglikelihood scoring: Tasks need to compare the probability of different continuations to select the most likely one.
  1. API Coverage
    vLLM's OpenAI-compatible API includes logprobs support as specified in the OpenAI API specification:

Text completions: /v1/completions endpoint supports logprobs parameter (integer 0-5)
Chat completions: /v1/chat/completions endpoint supports logprobs boolean and top_logprobs integer parameters

API Format:

Text Completion Request (OpenAI format)

{
  "model": "Qwen/Qwen2-0.5B",
  "prompt": "The capital of France is",
  "max_tokens": 5,
  "logprobs": 2
}

Text Completion Response (with logprobs)

{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1234567890,
  "model": "Qwen/Qwen2-0.5B",
  "choices": [
    {
      "text": " Paris",
      "index": 0,
      "logprobs": {
        "tokens": [" Paris"],
        "token_logprobs": [-0.15],
        "top_logprobs": [
          {
            " Paris": -0.15,
            " paris": -2.8
          }
        ],
        "text_offset": [0]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 1,
    "total_tokens": 6
  }
}

Chat Completion Request (OpenAI format)

{
  "model": "Qwen/Qwen2-0.5B",
  "messages": [
    {"role": "user", "content": "What is 2+2?"}
  ],
  "logprobs": true,
  "top_logprobs": 3
}

Chat Completion Response (with logprobs)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "Qwen/Qwen2-0.5B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "4"
      },
      "logprobs": {
        "content": [
          {
            "token": "4",
            "logprob": -0.08,
            "bytes": [52],
            "top_logprobs": [
              {"token": "4", "logprob": -0.08, "bytes": [52]},
              {"token": "four", "logprob": -3.2, "bytes": [102, 111, 117, 114]},
              {"token": "Four", "logprob": -4.1, "bytes": [70, 111, 117, 114]}
            ]
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 1,
    "total_tokens": 13
  }
}

The generated logprobs are synthetic and not mathematically valid. They are provided solely to enable llmd-sim to work with evaluation frameworks and tools that require logprobs in the API response. The focus is on API compatibility and integration testing, not on generating accurate probability distributions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions