Skip to content

Commit 7b081d2

Browse files
ttaylorrgitster
authored andcommitted
hash.h: introduce unsafe_hash_algo()
In 253ed9e (hash.h: scaffolding for _unsafe hashing variants, 2024-09-26), we introduced "unsafe" variants of the SHA-1 hashing functions by introducing new functions like "unsafe_init_fn()" and so on. This approach has a major shortcoming that callers must remember to consistently use one variant or the other. Failing to consistently use (or not use) the unsafe variants can lead to crashes at best, or subtle memory corruption issues at worst. In the hashfile API, this isn't difficult to achieve, but verifying that all callers consistently use the unsafe variants is somewhat of a chore given how spread out all of the callers are. In the sha1 and sha1-unsafe test helpers, all of the calls to various hash functions are guarded by an "if (unsafe)" conditional, which is repetitive and cumbersome. Address these issues by introducing a new pattern whereby one 'git_hash_algo' can return a pointer to another 'git_hash_algo' that represents the unsafe version of itself. So instead of having something like: if (unsafe) the_hash_algo->init_fn(...); the_hash_algo->update_fn(...); the_hash_algo->final_fn(...); else the_hash_algo->unsafe_init_fn(...); the_hash_algo->unsafe_update_fn(...); the_hash_algo->unsafe_final_fn(...); we can instead write: struct git_hash_algo *algop = the_hash_algo; if (unsafe) algop = unsafe_hash_algo(algop); algop->init_fn(...); algop->update_fn(...); algop->final_fn(...); This removes the existing shortcoming by no longer forcing the caller to "remember" which variant of the hash functions it wants to call, only to hold onto a 'struct git_hash_algo' pointer that is initialized once. Similarly, while there currently is still a way to "mix" safe and unsafe functions, this too will go away after subsequent commits remove all direct calls to the unsafe_ variants. Note that hash_algo_by_ptr() needs an adjustment to allow passing in the unsafe variant of a hash function. All other query functions on the hash_algos array will continue to return the safe variants of any function. Suggested-by: Jeff King <[email protected]> Signed-off-by: Taylor Blau <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 5fcc683 commit 7b081d2

File tree

2 files changed

+38
-1
lines changed

2 files changed

+38
-1
lines changed

hash.h

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,9 @@ struct git_hash_algo {
305305

306306
/* The all-zeros OID. */
307307
const struct object_id *null_oid;
308+
309+
/* The unsafe variant of this hash function, if one exists. */
310+
const struct git_hash_algo *unsafe;
308311
};
309312
extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
310313

@@ -320,9 +323,17 @@ int hash_algo_by_length(int len);
320323
/* Identical, except for a pointer to struct git_hash_algo. */
321324
static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
322325
{
323-
return p - hash_algos;
326+
size_t i;
327+
for (i = 0; i < GIT_HASH_NALGOS; i++) {
328+
const struct git_hash_algo *algop = &hash_algos[i];
329+
if (p == algop)
330+
return i;
331+
}
332+
return GIT_HASH_UNKNOWN;
324333
}
325334

335+
const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop);
336+
326337
const struct object_id *null_oid(void);
327338

328339
static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2, const struct git_hash_algo *algop)

object-file.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,22 @@ static void git_hash_unknown_final_oid(struct object_id *oid UNUSED,
202202
BUG("trying to finalize unknown hash");
203203
}
204204

205+
static const struct git_hash_algo sha1_unsafe_algo = {
206+
.name = "sha1",
207+
.format_id = GIT_SHA1_FORMAT_ID,
208+
.rawsz = GIT_SHA1_RAWSZ,
209+
.hexsz = GIT_SHA1_HEXSZ,
210+
.blksz = GIT_SHA1_BLKSZ,
211+
.init_fn = git_hash_sha1_init_unsafe,
212+
.clone_fn = git_hash_sha1_clone_unsafe,
213+
.update_fn = git_hash_sha1_update_unsafe,
214+
.final_fn = git_hash_sha1_final_unsafe,
215+
.final_oid_fn = git_hash_sha1_final_oid_unsafe,
216+
.empty_tree = &empty_tree_oid,
217+
.empty_blob = &empty_blob_oid,
218+
.null_oid = &null_oid_sha1,
219+
};
220+
205221
const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
206222
{
207223
.name = NULL,
@@ -239,6 +255,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
239255
.unsafe_update_fn = git_hash_sha1_update_unsafe,
240256
.unsafe_final_fn = git_hash_sha1_final_unsafe,
241257
.unsafe_final_oid_fn = git_hash_sha1_final_oid_unsafe,
258+
.unsafe = &sha1_unsafe_algo,
242259
.empty_tree = &empty_tree_oid,
243260
.empty_blob = &empty_blob_oid,
244261
.null_oid = &null_oid_sha1,
@@ -305,6 +322,15 @@ int hash_algo_by_length(int len)
305322
return GIT_HASH_UNKNOWN;
306323
}
307324

325+
const struct git_hash_algo *unsafe_hash_algo(const struct git_hash_algo *algop)
326+
{
327+
/* If we have a faster "unsafe" implementation, use that. */
328+
if (algop->unsafe)
329+
return algop->unsafe;
330+
/* Otherwise use the default one. */
331+
return algop;
332+
}
333+
308334
/*
309335
* This is meant to hold a *small* number of objects that you would
310336
* want repo_read_object_file() to be able to return, but yet you do not want

0 commit comments

Comments
 (0)