metal : adjust .get_alloc_size to be alloc friendly #16666

ggerganov · 2025-10-19T14:24:11Z

This seems to resolve the excessive compute buffer size, but I'm not yet sure that it's a good solution.

An alternative solution that seems to be more generic is to inplace ops only when the alloc size matches:

diff --git a/ggml/src/ggml-alloc.c b/ggml/src/ggml-alloc.c
index 929bc4488..da2234c66 100644
--- a/ggml/src/ggml-alloc.c
+++ b/ggml/src/ggml-alloc.c
@@ -632,6 +632,20 @@ static void ggml_gallocr_allocate_node(ggml_gallocr_t galloc, struct ggml_tensor
                 }
 
                 struct hash_node * p_hn = ggml_gallocr_hash_get(galloc, parent);
+
+                {
+                    ggml_backend_buffer_type_t p_buft = galloc->bufts[p_hn->buffer_id];
+                    const size_t p_size = ggml_backend_buft_get_alloc_size(p_buft, parent);
+
+                    ggml_backend_buffer_type_t buft = galloc->bufts[buffer_id];
+                    const size_t size = ggml_backend_buft_get_alloc_size(buft, node);
+
+                    if (size != p_size) {
+                        AT_PRINTF("not reusing parent %s for %s as alloc sizes are different\n", parent->name, node->name);
+                        continue;
+                    }
+                }
+
                 if (p_hn->n_children == 1 && p_hn->n_views == 0) {
                     if (ggml_is_view(parent)) {
                         struct ggml_tensor * view_src = parent->view_src;

slaren · 2025-10-20T10:31:25Z

Were you able to figure what causes this behavior in the allocator?

ggerganov · 2025-10-20T10:55:08Z

AFAICT the logic for merging blocks in ggml_dyn_tallocr_free_tensor does not get activated. This seems to happen when we have tensors with size N being inplaced in buffers with size N + extra. Though I don't yet have a very good understanding of the exact path through which this happens.

slaren · 2025-10-20T11:01:46Z

This seems to happen when we have tensors with size N being inplaced in buffers with size N + extra.

Um yeah, I think it might be a bug that causes the extra memory from the reused tensor to not be freed. I will take a look.

ggerganov · 2025-10-20T11:14:45Z

Um yeah, I think it might be a bug that causes the extra memory from the reused tensor to not be freed. I will take a look.

Yes, and in the next layers, when we try to allocate a tensor with N + extra it cannot find a previous block because of this, so it allocates a new block with size N + extra. So I observe these blocks getting accumulated with each layer.

slaren · 2025-10-20T11:20:31Z

#16679 should fix it.

metal : adjust .get_alloc_size to be alloc friendly

f0076dc

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Oct 19, 2025

ggerganov closed this Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : adjust .get_alloc_size to be alloc friendly #16666

metal : adjust .get_alloc_size to be alloc friendly #16666

Uh oh!

ggerganov commented Oct 19, 2025 •

edited

Loading

Uh oh!

slaren commented Oct 20, 2025

Uh oh!

ggerganov commented Oct 20, 2025

Uh oh!

slaren commented Oct 20, 2025

Uh oh!

ggerganov commented Oct 20, 2025

Uh oh!

slaren commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

metal : adjust .get_alloc_size to be alloc friendly #16666

metal : adjust .get_alloc_size to be alloc friendly #16666

Uh oh!

Conversation

ggerganov commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Oct 20, 2025

Uh oh!

ggerganov commented Oct 20, 2025

Uh oh!

slaren commented Oct 20, 2025

Uh oh!

ggerganov commented Oct 20, 2025

Uh oh!

slaren commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggerganov commented Oct 19, 2025 •

edited

Loading