llama : add normalized field to llama_token_data_array struct #16241

danbev · 2025-09-25T06:50:17Z

This commit adds a 'normalized' field to the llama_token_data_array
struct to indicate whether the probabilities have been computed and
normalized from the logits.

The motivation for this change is to avoid redundant normalization
calls in the sampling code, as the softmax calculation can be
expensive depending on the size of the llama_token_data array.

Samplers that modify logits or filter tokens (change the size) must set
normalized to false to invalidate cached probabilities. Samplers that
compute probabilities set it to true after normalization.

Refs: #9294 (review)

MaggotHATE · 2025-09-25T08:45:20Z

Changing the order of samplers will cause confusion and mismatch between llama.cpp and other LLM implementations - especially for such essential sampler as temperature. I'd suggest to keep the order and change the temperature sampler, if possible.

danbev · 2025-09-29T09:08:32Z

Looking into this more, I don't think merging the two fields will work. We need to preserve the original logits because:

When a sampler changes the array size (filters tokens), we need to renormalize the remaining probabilities
At that point, the single score/value field already contains probabilities, not logits
We can't accurately reconstruct logits from probabilities to recompute softmax

But perhaps we can still improve this by adding a flag to track normalization state:

struct llama_token_data_array {
    ...
    bool normalized;  // true if .p contains valid probabilities from current .logit values
};

This would allow samplers to:

Check the flag before calling softmax to avoid redundant recalculation
Understand the state, like are the .p values currently valid?

Sampler responsibilities:

If modifying .logit values then set normalized = false
If a sampler changes the size of the token data array, and normalized is true, then it must call softmax again before returning (since probabilities no longer sum to 1.0)
If a sampler needs probabilities then it can check the normalized flag, and only call softmax if false

So this should address the concern in the original comment:

Having two values per token is very confusing to me because some samplers operate on one or other, and this can lead to situations where a sampler modifies the probabilities, and the next one calls softmax which discards all the changes to the probabilities and computes them from the logits again.

If the second sampler checked the normalized field it would not overwrite the probabilities of the previous sampler. And if the second sampler updates the probabilites, or modifies the size of the token data array then it would recalculate the softmax and set the normalized field to true.

MaggotHATE · 2025-09-29T09:34:39Z

I always thought that the end goal in reworking sampling was to standardize on logits only (which makes sense) and use probabilities as temporary/QoL values. Maybe I've forgotten something, but probabilities should always depend on logits, no?

Resorting changes positions (so, logits + probabilities should be updated), truncating changes the set of candidates (so, logits + probabilities should be updated too). So far these are the two end results of all sampling algorithms. Am I missing something?

danbev · 2025-09-29T10:57:01Z

@MaggotHATE Thanks for your comments on this!

Maybe I've forgotten something, but probabilities should always depend on logits, no?

I think I was just wrong in my initial take on this task and that the single field should just have been the logits if we only have one field.

So if I understood your comment above, then each sampler that needs probabilities could just compute the softmax from the logits in the sampler's llama_sampler_apply function, local to the function. So we'd just have the logit field in llama_token_data and remove the p field entirely?

However, softmax is a somewhat expensive operation, so having a flag (like normalized) to track whether valid probabilities are already available in .p could avoid redundant softmax calls. This would make .p essentially a cached/QoL value rather than independent state.

This is my first real dive into the samplers, so I may be missing some context or design considerations. Does the approach above align with what you were thinking, or is there a different direction I should be considering?

MaggotHATE · 2025-09-29T11:54:03Z

However, softmax is a somewhat expensive operation, so having a flag (like normalized) to track whether valid probabilities are already available in .p could avoid redundant softmax calls. This would make .p essentially a cached/QoL value rather than independent state.

Yep, that's what I've forgotten; the rework on sampling started because of expensive softmax (at least partly for that reason).

I think the state approach is better than nothing - unnecessary calls will be eliminated, speeding up inference (especially for large chains of samplers). I'm not sure if there's a better approach to logits/probabilities structure, but maybe others will have ideas.

For a moment I thought about keeping logits const and working with probabilities only, but that would be a nightmare to optimize.

This commit adds a 'normalized' field to the llama_token_data_array struct to indicate whether the probabilities have been computed and normalized from the logits. The motivation for this change is to avoid redundant normalization calls in the sampling code, as the softmax calculation can be expensive depending on the size of the llama_token_data array. Samplers that modify logits or filter tokens (change the size) must set normalized to false to invalidate cached probabilities. Samplers that compute probabilities set it to true after normalization.

ggerganov · 2025-10-08T07:03:20Z

I am not sure this change solves the original concern because we still have the 2 values p and logit present. We now would have one more extra state that we have to keep track of - the normalized flag.

However, I'm not sure what would be a better approach.

MaggotHATE · 2025-10-08T10:48:54Z

I am not sure this change solves the original concern because we still have the 2 values p and logit present. We now would have one more extra state that we have to keep track of - the normalized flag.

However, I'm not sure what would be a better approach.

Is it even possible to standardize on one value?

If logit is the one we keep, redistribution becomes obscured, because probabilities never exist as readable values. Even if probabilities are added as display values only (on demand), it will require rewriting multiple samplers (top_p, typical, temp_ext, xtc, both mirostats, infill, maybe more).
If p is the one we keep, the entire sampling chain becomes "detached" from the original logit values - and I'm not sure it's even possible.

danbev · 2025-10-09T03:52:42Z

I am not sure this change solves the original concern because we still have the 2 values p and logit present. We now would have one more extra state that we have to keep track of - the normalized flag.

I was not able to find a good way to merge this into a single field. This commit tries to address the issue/possibility of samplers overwriting calculated probabilities mentioned in #9294 (review). But we can always leave this as it is for now and reopen at a later point if needed.

github-actions bot added testing Everything test related examples server labels Sep 25, 2025

danbev force-pushed the llama-token-data-one-value branch from 3271c6d to 17855ff Compare October 7, 2025 14:03

danbev changed the title ~~llama : merge logit and p fields in llama_token_data~~ llama : add normalized field to llama_token_data_array struct Oct 7, 2025

fix normalized init/update in test-grammar-llguidance.cpp

21d44e7

danbev marked this pull request as ready for review October 8, 2025 05:59

danbev requested a review from ggerganov as a code owner October 8, 2025 05:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : add normalized field to llama_token_data_array struct #16241

llama : add normalized field to llama_token_data_array struct #16241

danbev commented Sep 25, 2025 •

edited

Loading

Uh oh!

MaggotHATE commented Sep 25, 2025

Uh oh!

danbev commented Sep 29, 2025

Uh oh!

MaggotHATE commented Sep 29, 2025

Uh oh!

danbev commented Sep 29, 2025

Uh oh!

MaggotHATE commented Sep 29, 2025

Uh oh!

ggerganov commented Oct 8, 2025

Uh oh!

MaggotHATE commented Oct 8, 2025

Uh oh!

danbev commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llama : add normalized field to llama_token_data_array struct #16241

Are you sure you want to change the base?

llama : add normalized field to llama_token_data_array struct #16241

Conversation

danbev commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaggotHATE commented Sep 25, 2025

Uh oh!

danbev commented Sep 29, 2025

Uh oh!

MaggotHATE commented Sep 29, 2025

Uh oh!

danbev commented Sep 29, 2025

Uh oh!

MaggotHATE commented Sep 29, 2025

Uh oh!

ggerganov commented Oct 8, 2025

Uh oh!

MaggotHATE commented Oct 8, 2025

Uh oh!

danbev commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danbev commented Sep 25, 2025 •

edited

Loading