Skip to content

Conversation

dscho
Copy link
Member

@dscho dscho commented Mar 6, 2025

This fixes the problems described in https://lore.kernel.org/git/[email protected]/.

Cc: Patrick Steinhardt [email protected]

The reftable library goes out of its way to use its own set of allocator
functions that can be configured using `reftable_set_alloc()`. However,
Git does not configure this.

That is not typically a problem, except when Git uses a custom allocator
via some definitions in `git-compat-util.h`, as is the case in Git for
Windows (which switched away from the long-unmaintained nedmalloc to
mimalloc).

Then, it is quite possible that Git assigns a `strbuf` (allocated via
the custom allocator) to, say, the `refname` field of a
`reftable_log_record` in `write_transaction_table()`, and later on asks
the reftable library function `reftable_log_record_release()` to release
it, but that function was compiled without using `git-compat-util.h` and
hence calls regular `free()` (i.e. _not_ the custom allocator's own
function).

This has been a problem for a long time and it was a matter of some sort
of "luck" that 1) reftables are not commonly used on Windows, and 2)
mimalloc can often ignore gracefully when it is asked to release memory
that it has not allocated.

However, a recent update to `seen` brought this problem to the
forefront, letting t1460 fail in Git for Windows, with symptoms much in
the same way as the problem I had to address in d02c37c
(t-reftable-basics: allow for `malloc` to be `#define`d, 2025-01-08)
where exit code 127 was also produced in lieu of
`STATUS_HEAP_CORRUPTION` (C0000374) because exit codes are only 7 bits
wide.

It was not possible to figure out what change in particular caused these
new failures within a reasonable time frame, as there are too many
changes in `seen` that conflict with Git for Windows' patches, I had to
stop the investigation after spending four hours on it fruitlessly.

To verify that this patch fixes the issue, I avoided using mimalloc and
temporarily patched in a "custom allocator" that would more reliably
point out problems, like this:

  diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
  index 68f3829..9421d630b9f5 100644
  --- a/refs/reftable-backend.c
  +++ b/refs/reftable-backend.c
  @@ -353,6 +353,69 @@ static int reftable_be_fsync(int fd)
   	return fsync_component(FSYNC_COMPONENT_REFERENCE, fd);
   }

  +#define DEBUG_REFTABLE_ALLOC
  +#ifdef DEBUG_REFTABLE_ALLOC
  +#include "khash.h"
  +
  +static inline khint_t __ac_X31_hash_ptr(void *ptr)
  +{
  +	union {
  +		void *ptr;
  +		char s[sizeof(void *)];
  +	} u;
  +	size_t i;
  +	khint_t h;
  +
  +	u.ptr = ptr;
  +	h = (khint_t)*u.s;
  +	for (i = 0; i < sizeof(void *); i++)
  +		h = (h << 5) - h + (khint_t)u.s[i];
  +	return h;
  +}
  +
  +#define kh_ptr_hash_func(key) __ac_X31_hash_ptr(key)
  +#define kh_ptr_hash_equal(a, b) ((a) == (b))
  +
  +KHASH_INIT(ptr, void *, int, 0, kh_ptr_hash_func, kh_ptr_hash_equal)
  +
  +static kh_ptr_t *my_malloced;
  +
  +static void *my_malloc(size_t sz)
  +{
  +	int dummy;
  +	void *ptr = malloc(sz);
  +	if (ptr)
  +		kh_put_ptr(my_malloced, ptr, &dummy);
  +	return ptr;
  +}
  +
  +static void *my_realloc(void *ptr, size_t sz)
  +{
  +	int dummy;
  +	if (ptr) {
  +		khiter_t pos = kh_get_ptr(my_malloced, ptr);
  +		if (pos >= kh_end(my_malloced))
  +			die("Was not my_malloc()ed: %p", ptr);
  +		kh_del_ptr(my_malloced, pos);
  +	}
  +	ptr = realloc(ptr, sz);
  +	if (ptr)
  +		kh_put_ptr(my_malloced, ptr, &dummy);
  +	return ptr;
  +}
  +
  +static void my_free(void *ptr)
  +{
  +	if (ptr) {
  +		khiter_t pos = kh_get_ptr(my_malloced, ptr);
  +		if (pos >= kh_end(my_malloced))
  +			die("Was not my_malloc()ed: %p", ptr);
  +		kh_del_ptr(my_malloced, pos);
  +	}
  +	free(ptr);
  +}
  +#endif
  +
   static struct ref_store *reftable_be_init(struct repository *repo,
   					  const char *gitdir,
   					  unsigned int store_flags)
  @@ -362,6 +425,11 @@ static struct ref_store *reftable_be_init(struct repository *repo,
   	int is_worktree;
   	mode_t mask;

  +#ifdef DEBUG_REFTABLE_ALLOC
  +	my_malloced = kh_init_ptr();
  +	reftable_set_alloc(my_malloc, my_realloc, my_free);
  +#endif
  +
   	mask = umask(0);
   	umask(mask);

I briefly considered contributing this "custom allocator" patch, too,
but it is unwieldy (for example, it would not work at all when compiling
with mimalloc support) and it would only waste space (or even time, if a
compile flag was introduced and exercised as part of the CI builds).
Given that it is highly unlikely that Git will lose the new
`reftable_set_alloc()` call by mistake, I rejected that idea as simply
too wasteful.

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho self-assigned this Mar 6, 2025
@dscho
Copy link
Member Author

dscho commented Mar 10, 2025

@pks for the record, this is the cumbersome approach that I talked about, the one that I think needs to be abandoned because it piles patch upon patch just to avoid moving the mimalloc stuff from git-compat-util.h to posix.h.

@dscho dscho closed this Mar 10, 2025
@dscho dscho deleted the reftable-vs-custom-allocators branch March 11, 2025 17:13
@pks
Copy link

pks commented Mar 17, 2025

Hey @dscho, I think you're pinging the wrong guy.

@dscho
Copy link
Member Author

dscho commented Mar 17, 2025

Oops, @pks-t my comment was for you...

@pks-t
Copy link

pks-t commented Mar 19, 2025

Yup, unfortunately @pks was taken already as a name 😅

@dscho Thanks for investigating this one. In any case, I'm fine with not pursuing this for now, but may eventually revisit it.

What does this mean for the ps/reftable-sans-compat-util topic that has triggered this investigation? Can that topic be moved forward with my proposed fix to move the mimalloc definitions around? Is there anything else I can help you with in this context? I'd really like to get the topic unblocked so that I can continue to iterate on the reftable library.

@dscho
Copy link
Member Author

dscho commented Mar 20, 2025

What does this mean for the ps/reftable-sans-compat-util topic that has triggered this investigation? Can that topic be moved forward with my proposed fix to move the mimalloc definitions around?

Just move forward with this topic. I'll deal with the fall-out, as usual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants