-
Notifications
You must be signed in to change notification settings - Fork 13.4k
llama : add normalized field to llama_token_data_array struct #16241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Changing the order of samplers will cause confusion and mismatch between llama.cpp and other LLM implementations - especially for such essential sampler as temperature. I'd suggest to keep the order and change the temperature sampler, if possible. |
Looking into this more, I don't think merging the two fields will work. We need to preserve the original logits because:
But perhaps we can still improve this by adding a flag to track normalization state: struct llama_token_data_array {
...
bool normalized; // true if .p contains valid probabilities from current .logit values
}; This would allow samplers to:
Sampler responsibilities:
So this should address the concern in the original comment:
If the second sampler checked the |
I always thought that the end goal in reworking sampling was to standardize on logits only (which makes sense) and use probabilities as temporary/QoL values. Maybe I've forgotten something, but probabilities should always depend on logits, no? Resorting changes positions (so, logits + probabilities should be updated), truncating changes the set of candidates (so, logits + probabilities should be updated too). So far these are the two end results of all sampling algorithms. Am I missing something? |
@MaggotHATE Thanks for your comments on this!
I think I was just wrong in my initial take on this task and that the single field should just have been the logits if we only have one field. So if I understood your comment above, then each sampler that needs probabilities could just compute the softmax from the logits in the sampler's However, softmax is a somewhat expensive operation, so having a flag (like normalized) to track whether valid probabilities are already available in .p could avoid redundant softmax calls. This would make .p essentially a cached/QoL value rather than independent state. This is my first real dive into the samplers, so I may be missing some context or design considerations. Does the approach above align with what you were thinking, or is there a different direction I should be considering? |
Yep, that's what I've forgotten; the rework on sampling started because of expensive softmax (at least partly for that reason). I think the state approach is better than nothing - unnecessary calls will be eliminated, speeding up inference (especially for large chains of samplers). I'm not sure if there's a better approach to logits/probabilities structure, but maybe others will have ideas. For a moment I thought about keeping logits const and working with probabilities only, but that would be a nightmare to optimize. |
This commit adds a 'normalized' field to the llama_token_data_array struct to indicate whether the probabilities have been computed and normalized from the logits. The motivation for this change is to avoid redundant normalization calls in the sampling code, as the softmax calculation can be expensive depending on the size of the llama_token_data array. Samplers that modify logits or filter tokens (change the size) must set normalized to false to invalidate cached probabilities. Samplers that compute probabilities set it to true after normalization.
3271c6d
to
17855ff
Compare
I am not sure this change solves the original concern because we still have the 2 values However, I'm not sure what would be a better approach. |
Is it even possible to standardize on one value?
|
I was not able to find a good way to merge this into a single field. This commit tries to address the issue/possibility of samplers overwriting calculated probabilities mentioned in #9294 (review). But we can always leave this as it is for now and reopen at a later point if needed. |
This commit adds a 'normalized' field to the llama_token_data_array
struct to indicate whether the probabilities have been computed and
normalized from the logits.
The motivation for this change is to avoid redundant normalization
calls in the sampling code, as the softmax calculation can be
expensive depending on the size of the llama_token_data array.
Samplers that modify logits or filter tokens (change the size) must set
normalized to false to invalidate cached probabilities. Samplers that
compute probabilities set it to true after normalization.
Refs: #9294 (review)