Skip to content

Conversation

@struct
Copy link
Contributor

@struct struct commented Jul 22, 2025

The current implementation iterates the vector of tensors comparing each name using strcmp. This operation gets slower with each new tensor added. We can improve the performance of this using an std::unordered_map which has a better average lookup time as the tensor vector grows in size.

Tested with:

$ build/bin/llama-bench 
| model                          |       size |     params | backend    | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
| llama 1B Q4_0                  | 606.54 MiB |     1.10 B | Metal,BLAS |      12 |           pp512 |       5694.21 ± 8.22 |
| llama 1B Q4_0                  | 606.54 MiB |     1.10 B | Metal,BLAS |      12 |           tg128 |        339.54 ± 4.73 |

build: 38d3af1b (5955)
$ build/bin/llama-gguf models/7B/ggml-model-q4_0.gguf r n
$ build/bin/test-gguf 
...
108/108 tests passed
OK
 bash ./ci/run.sh ./tmp/results ./tmp/mnt

@struct struct requested a review from JohannesGaessler as a code owner July 22, 2025 15:15
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jul 22, 2025
@struct
Copy link
Contributor Author

struct commented Jul 22, 2025

This is a trivial change involving a temporary local unordered_map. I can't help but wonder why gguf_context doesn't hold similar data structures to improve lookups in gguf_get_key or gguf_get_tensor? Is there an explicit preference for these simple O(N) for loops?

#include <cstdlib>
#include <cstring>
#include <map>
#include <unordered_set>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remnant from earlier iteration?

ok = false;
break;
}
auto [it, result] = tensor_names.emplace(info.t.name, i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto [it, result] = tensor_names.emplace(info.t.name, i);
auto [it, result] = tensor_names.emplace(info.t.name, i);

@struct struct closed this Jul 22, 2025
@struct struct force-pushed the tensor_name_duplicate_lookup branch from 1302318 to 38d3af1 Compare July 22, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants