feat: Log probabilities support (with cache) #215

ruivieira · 2025-10-16T10:22:03Z

This PR is a proposal to add log probabilities (logprobs) support to llm-d-inference-sim following vLLM's architecture.

vLLM-Style LogprobsProcessor

Added a new LogprobsProcessor that following vLLM's implementation (same tokens always produce same logprobs)

API coverage

/v1/completions: logprobs parameter (0-5)
/v1/chat/completions: logprobs boolean + top_logprobs integer
Per-token logprobs in SSE chunks
Response format matching vLLM/OpenAI

Refer to #213

Signed-off-by: Rui Vieira <[email protected]>

mayabar · 2025-10-16T11:43:41Z

pkg/llm-d-inference-sim/logprobs_processor_test.go

+	// Check cache stats
+	hits, misses, hitRate := processor.GetCacheStats()
+	if hits != 1 {
+		t.Errorf("Expected 1 cache hit, got %d", hits)


In all tests in this project we use Expect function of onsi/gomega. E.g. Expect(err).NotTo(HaveOccurred()), Expect(totalBlocks).To(Equal(unusedBlocks))...
Can you please change to be consistent with other tests.

mayabar · 2025-10-16T11:45:01Z

pkg/llm-d-inference-sim/streaming.go

 		if i == len(genTokens)-1 && (finishReason == dataset.LengthFinishReason || finishReason == dataset.ToolsFinishReason) {
 			finishReasonToSend = &finishReason
 		}
+


this part does not change during the loop, could be moved to be calculated before the loop starts

Good point! Moved to outside.

mayabar · 2025-10-16T12:14:12Z

@ruivieira Thank you for your contribution!
I have added some general comments regarding the project's conventions for now.
I will perform a more detailed review of the PR soon.

Signed-off-by: Rui Vieira <[email protected]>

mayabar · 2025-10-16T11:49:43Z

pkg/llm-d-inference-sim/streaming.go

 // API response, for chat completion. It sets either role, or token, or tool call info in the message.
 func (s *VllmSimulator) createChatCompletionChunk(context *streamingContext, token string, tool *openaiserverapi.ToolCall,
-	role string, finishReason *string) openaiserverapi.CompletionRespChunk {
+	role string, finishReason *string, includeLogprobs bool, topK int) openaiserverapi.CompletionRespChunk {


The boolean value which defines if logprob should be returned and the topk value are constant during a request processing, such kind of values we store in streamingContext.
If token is empty no need to create all logprob related fields, otherwise relevant response part will be created.

irar2 · 2025-10-19T08:53:41Z

pkg/llm-d-inference-sim/simulator.go

 							nPromptTokens:       usageData.PromptTokens,
 							nCachedPromptTokens: reqCtx.CompletionReq.GetNumberOfCachedPromptTokens(),
+							requestID:           req.GetRequestID(),
+							request:             req,


Please add shouldIncludeLogprobs and topK (maybe change the name to explain the context) to streamingContext instead of request.

(And thanks for adding the missing request id)

irar2 · 2025-10-19T08:54:46Z

pkg/llm-d-inference-sim/simulator.go

 func (s *VllmSimulator) sendResponse(reqCtx *openaiserverapi.CompletionReqCtx, respTokens []string, toolCalls []openaiserverapi.ToolCall,
 	modelName string, finishReason string, usageData *openaiserverapi.Usage) {
-	resp := s.createCompletionResponse(reqCtx.IsChatCompletion, respTokens, toolCalls, &finishReason, usageData, modelName,
+	resp := s.createCompletionResponse(reqCtx.CompletionReq, reqCtx.IsChatCompletion, respTokens, toolCalls, &finishReason, usageData, modelName,


Please pass shouldIncludeLogprobs and topK instead of request

irar2 · 2025-10-19T08:55:01Z

pkg/llm-d-inference-sim/simulator.go

 // modelName - display name returned to the client and used in metrics. It is either the first alias
 // from --served-model-name (for a base-model request) or the LoRA adapter name (for a LoRA request).
-func (s *VllmSimulator) createCompletionResponse(isChatCompletion bool, respTokens []string, toolCalls []openaiserverapi.ToolCall,
+func (s *VllmSimulator) createCompletionResponse(req openaiserverapi.CompletionRequest, isChatCompletion bool, respTokens []string, toolCalls []openaiserverapi.ToolCall,


Please pass shouldIncludeLogprobs and topK instead of request

irar2 · 2025-10-19T09:31:02Z

pkg/llm-d-inference-sim/logprobs_processor.go

+type LogprobData struct {
+	MainLogprob float64                                 `json:"main_logprob"`
+	TopLogprobs []openaiserverapi.ChatCompletionLogProb `json:"top_logprobs"`
+}


Why do the struct and the fields have to be public? And why are the json tags needed?

irar2 · 2025-10-19T09:37:29Z

pkg/llm-d-inference-sim/logprobs_processor_test.go

+	hits2, misses2, _ := processor.GetCacheStats()
+	Expect(hits2).To(BeNumerically(">=", 0))
+	Expect(misses2).To(BeNumerically(">=", 3))
+}


Could you please add tests for the simulator with logprobs:
completions and chat completions requests with logprobs with and without streaming, and check the response?

mayabar · 2025-10-19T12:45:34Z

pkg/llm-d-inference-sim/logprobs_processor.go

+	cacheMutex sync.RWMutex
+
+	// cacheHits and cacheMisses for metrics
+	cacheHits   int64


cacheHits and cacheMisses should be changed under protection of the mutex or be an atomic counters (from sync/atomic package)

mayabar · 2025-10-19T12:51:45Z

pkg/llm-d-inference-sim/logprobs_processor.go

+}
+
+// generateDeterministicLogprobs creates logprobs with deterministic values based on token content
+// This follows vLLM's approach of consistent logprobs for the same token in similar contexts


@ruivieira can you please explain how context is taken into consideration here. As I understand a token should give same logprob only in the same/similar context. What does context means here? Isn't it the previous tokens? But they are not passed to this function.

mayabar · 2025-10-19T13:09:35Z

pkg/llm-d-inference-sim/logprobs_processor.go

+		for k := range lp.tokenCache {
+			delete(lp.tokenCache, k)
+			break
+		}


Iteration on a map will not be in order of adding objects to the map, is we want to remove the oldest item - need to store the insertion date/time or latest usage date/time

mayabar · 2025-10-19T13:11:01Z

pkg/llm-d-inference-sim/logprobs_processor.go

+}
+
+// ClearCache clears the logprobs cache
+func (lp *LogprobsProcessor) ClearCache() {


mayabar · 2025-10-19T13:11:57Z

pkg/llm-d-inference-sim/logprobs_processor.go

+}
+
+// GetCacheStats returns cache performance statistics
+func (lp *LogprobsProcessor) GetCacheStats() (hits, misses int64, hitRate float64) {


consider to not add function used in test only to the class, we may add an additional file or maybe choose not to use it in test

mayabar · 2025-10-19T13:22:51Z

@ruivieira Please see some more comment I added.

General question:
In the related issue #213 written

The generated logprobs are synthetic and not mathematically valid. They are provided solely to enable llmd-sim to work with evaluation frameworks and tools that require logprobs in the API response. The focus is on API compatibility and integration testing, not on generating accurate probability distributions.

If the purpose is only to support the API without concern for the actual returned content, why not generate random responses (both tokens and probabilities) and keep the implementation minimal — without involving cache, hash calculations, or other overhead?

ruivieira · 2025-10-19T22:23:25Z

@mayabar That's a good point. My original implementation did take that approach, but ultimately I've pushed this one mainly for reproducibility and consistent (even if simulated) results.

I agree that purely for API coverage generating random responses is enough and avoids the overhead. I'll open a new PR with a new implementation and, if preferred, we close this one?

mayabar · 2025-10-20T07:14:50Z

@ruivieira I think that continue with this PR is better way, you have here lot of relevant code, you just can remove from the logprobs_processor all code relevant to the cache and use random response generation.

ruivieira · 2025-10-20T23:03:45Z

@mayabar did not see your comment before opening #221 :)
Let me know if you still prefer to change this one, fine for me either way. Thanks!

mayabar · 2025-10-21T09:30:47Z

Let's continue with your new PR #221

mayabar · 2025-10-21T12:08:22Z

Replaced by #221

feat: Log probabilities support

2137235

Signed-off-by: Rui Vieira <[email protected]>

mayabar reviewed Oct 16, 2025

View reviewed changes

ruivieira added 3 commits October 16, 2025 13:40

chore: Move testing to Gomega

871e591

Signed-off-by: Rui Vieira <[email protected]>

chore: extract condition to outside loop

30e0e6e

Signed-off-by: Rui Vieira <[email protected]>

chore: format files

8219630

Signed-off-by: Rui Vieira <[email protected]>

mayabar reviewed Oct 19, 2025

View reviewed changes

irar2 requested changes Oct 19, 2025

View reviewed changes

mayabar reviewed Oct 19, 2025

View reviewed changes

ruivieira mentioned this pull request Oct 20, 2025

feat: Log probabilities support #221

Merged

mayabar changed the title ~~feat: Log probabilities support~~ feat: Log probabilities support (with cache) Oct 21, 2025

mayabar closed this Oct 21, 2025

feat: Log probabilities support (with cache) #215

feat: Log probabilities support (with cache) #215

Uh oh!

Conversation

ruivieira commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mayabar commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mayabar commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ruivieira commented Oct 19, 2025

Uh oh!

mayabar commented Oct 20, 2025

Uh oh!

ruivieira commented Oct 20, 2025

Uh oh!

mayabar commented Oct 21, 2025

Uh oh!

mayabar commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ruivieira commented Oct 16, 2025 •

edited

Loading

mayabar commented Oct 19, 2025 •

edited

Loading