Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Oct 19, 2025

ref #16646 (comment)

This seems to resolve the excessive compute buffer size, but I'm not yet sure that it's a good solution.

An alternative solution that seems to be more generic is to inplace ops only when the alloc size matches:

diff --git a/ggml/src/ggml-alloc.c b/ggml/src/ggml-alloc.c
index 929bc4488..da2234c66 100644
--- a/ggml/src/ggml-alloc.c
+++ b/ggml/src/ggml-alloc.c
@@ -632,6 +632,20 @@ static void ggml_gallocr_allocate_node(ggml_gallocr_t galloc, struct ggml_tensor
                 }
 
                 struct hash_node * p_hn = ggml_gallocr_hash_get(galloc, parent);
+
+                {
+                    ggml_backend_buffer_type_t p_buft = galloc->bufts[p_hn->buffer_id];
+                    const size_t p_size = ggml_backend_buft_get_alloc_size(p_buft, parent);
+
+                    ggml_backend_buffer_type_t buft = galloc->bufts[buffer_id];
+                    const size_t size = ggml_backend_buft_get_alloc_size(buft, node);
+
+                    if (size != p_size) {
+                        AT_PRINTF("not reusing parent %s for %s as alloc sizes are different\n", parent->name, node->name);
+                        continue;
+                    }
+                }
+
                 if (p_hn->n_children == 1 && p_hn->n_views == 0) {
                     if (ggml_is_view(parent)) {
                         struct ggml_tensor * view_src = parent->view_src;

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Oct 19, 2025
@slaren
Copy link
Member

slaren commented Oct 20, 2025

Were you able to figure what causes this behavior in the allocator?

@ggerganov
Copy link
Member Author

AFAICT the logic for merging blocks in ggml_dyn_tallocr_free_tensor does not get activated. This seems to happen when we have tensors with size N being inplaced in buffers with size N + extra. Though I don't yet have a very good understanding of the exact path through which this happens.

@slaren
Copy link
Member

slaren commented Oct 20, 2025

This seems to happen when we have tensors with size N being inplaced in buffers with size N + extra.

Um yeah, I think it might be a bug that causes the extra memory from the reused tensor to not be freed. I will take a look.

@ggerganov
Copy link
Member Author

Um yeah, I think it might be a bug that causes the extra memory from the reused tensor to not be freed. I will take a look.

Yes, and in the next layers, when we try to allocate a tensor with N + extra it cannot find a previous block because of this, so it allocates a new block with size N + extra. So I observe these blocks getting accumulated with each layer.

@slaren
Copy link
Member

slaren commented Oct 20, 2025

#16679 should fix it.

@ggerganov ggerganov closed this Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants