Skip to content

Commit 5159cd9

Browse files
committed
Merge branch 'bc/sha1-256-interop-02' into seen
The code to maintain mapping between object names in multiple hash functions is being added, written in Rust. * bc/sha1-256-interop-02: object-file-convert: always make sure object ID algo is valid rust: add a small wrapper around the hashfile code rust: add a new binary object map format rust: add functionality to hash an object rust: add a build.rs script for tests hash: expose hash context functions to Rust write-or-die: add an fsync component for the object map csum-file: define hashwrite's count as a uint32_t rust: add additional helpers for ObjectID hash: add a function to look up hash algo structs rust: add a hash algorithm abstraction rust: add a ObjectID struct hash: use uint32_t for object_id algorithm conversion: don't crash when no destination algo repository: require Rust support for interoperability
2 parents 3a02d65 + 68aace5 commit 5159cd9

24 files changed

+1722
-74
lines changed

Documentation/gitformat-loose.adoc

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ SYNOPSIS
1010
--------
1111
[verse]
1212
$GIT_DIR/objects/[0-9a-f][0-9a-f]/*
13+
$GIT_DIR/objects/object-map/map-*.map
1314
1415
DESCRIPTION
1516
-----------
@@ -48,6 +49,83 @@ stored under
4849
Similarly, a blob containing the contents `abc` would have the uncompressed
4950
data of `blob 3\0abc`.
5051
52+
== Loose object mapping
53+
54+
When the `compatObjectFormat` option is used, Git needs to store a mapping
55+
between the repository's main algorithm and the compatibility algorithm for
56+
loose objects as well as some auxiliary information.
57+
58+
The mapping consists of a set of files under `$GIT_DIR/objects/object-map`
59+
ending in `.map`. The portion of the filename before the extension is that of
60+
the main hash checksum (that is, the one specified in
61+
`extensions.objectformat`) in hex format.
62+
63+
`git gc` will repack existing entries into one file, removing any unnecessary
64+
objects, such as obsolete shallow entries or loose objects that have been
65+
packed.
66+
67+
The file format is as follows. All values are in network byte order and all
68+
4-byte and 8-byte values must be 4-byte aligned in the file, so the NUL padding
69+
may be required in some cases. Git always uses the smallest number of NUL
70+
bytes (including zero) that is required for the padding in order to make
71+
writing files deterministic.
72+
73+
- A header appears at the beginning and consists of the following:
74+
* A 4-byte mapping signature: `LMAP`
75+
* 4-byte version number: 1
76+
* 4-byte length of the header section (including reserved entries but
77+
excluding any NUL padding).
78+
* 4-byte number of objects declared in this map file.
79+
* 4-byte number of object formats declared in this map file.
80+
* For each object format:
81+
** 4-byte format identifier (e.g., `sha1` for SHA-1)
82+
** 4-byte length in bytes of shortened object names (that is, prefixes of
83+
the full object names). This is the shortest possible length needed to
84+
make names in the shortened object name table unambiguous.
85+
** 8-byte integer, recording where tables relating to this format
86+
are stored in this index file, as an offset from the beginning.
87+
* 8-byte offset to the trailer from the beginning of this file.
88+
* The remainder of the header section is reserved for future use.
89+
Readers must ignore unrecognized data here.
90+
- Zero or more NUL bytes. These are used to improve the alignment of the
91+
4-byte quantities below.
92+
- Tables for the first object format:
93+
* A sorted table of shortened object names. These are prefixes of the names
94+
of all objects in this file, packed together to reduce the cache footprint
95+
of the binary search for a specific object name.
96+
* A sorted table of full object names.
97+
* A table of 4-byte metadata values.
98+
- Zero or more NUL bytes.
99+
- Tables for subsequent object formats:
100+
* A sorted table of shortened object names. These are prefixes of the names
101+
of all objects in this file, packed together without offset values to
102+
reduce the cache footprint of the binary search for a specific object name.
103+
* A table of full object names in the order specified by the first object format.
104+
* A table of 4-byte values mapping object name order to the order of the
105+
first object format. For an object in the table of sorted shortened object
106+
names, the value at the corresponding index in this table is the index in
107+
the previous table for that same object.
108+
* Zero or more NUL bytes.
109+
- The trailer consists of the following:
110+
* Hash checksum of all of the above using the main hash.
111+
112+
The lower six bits of each metadata table contain a type field indicating the
113+
reason that this object is stored:
114+
115+
0::
116+
Reserved.
117+
1::
118+
This object is stored as a loose object in the repository.
119+
2::
120+
This object is a shallow entry. The mapping refers to a shallow value
121+
returned by a remote server.
122+
3::
123+
This object is a submodule entry. The mapping refers to the commit stored
124+
representing a submodule.
125+
126+
Other data may be stored in this field in the future. Bits that are not used
127+
must be zero.
128+
51129
GIT
52130
---
53131
Part of the linkgit:git[1] suite

Makefile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1537,7 +1537,10 @@ CLAR_TEST_OBJS += $(UNIT_TEST_DIR)/unit-test.o
15371537

15381538
UNIT_TEST_OBJS += $(UNIT_TEST_DIR)/test-lib.o
15391539

1540+
RUST_SOURCES += src/csum_file.rs
1541+
RUST_SOURCES += src/hash.rs
15401542
RUST_SOURCES += src/lib.rs
1543+
RUST_SOURCES += src/loose.rs
15411544
RUST_SOURCES += src/varint.rs
15421545

15431546
GIT-VERSION-FILE: FORCE
@@ -2972,7 +2975,7 @@ scalar$X: scalar.o GIT-LDFLAGS $(GITLIBS)
29722975
$(LIB_FILE): $(LIB_OBJS)
29732976
$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
29742977

2975-
$(RUST_LIB): Cargo.toml $(RUST_SOURCES)
2978+
$(RUST_LIB): Cargo.toml $(RUST_SOURCES) $(LIB_FILE)
29762979
$(QUIET_CARGO)cargo build $(CARGO_ARGS)
29772980

29782981
.PHONY: rust

build.rs

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
// This program is free software; you can redistribute it and/or modify
2+
// it under the terms of the GNU General Public License as published by
3+
// the Free Software Foundation: version 2 of the License, dated June 1991.
4+
//
5+
// This program is distributed in the hope that it will be useful,
6+
// but WITHOUT ANY WARRANTY; without even the implied warranty of
7+
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
8+
// GNU General Public License for more details.
9+
//
10+
// You should have received a copy of the GNU General Public License along
11+
// with this program; if not, see <https://www.gnu.org/licenses/>.
12+
13+
fn main() {
14+
println!("cargo:rustc-link-search=.");
15+
println!("cargo:rustc-link-lib=git");
16+
println!("cargo:rustc-link-lib=z");
17+
}

csum-file.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ void discard_hashfile(struct hashfile *f)
110110
free_hashfile(f);
111111
}
112112

113-
void hashwrite(struct hashfile *f, const void *buf, unsigned int count)
113+
void hashwrite(struct hashfile *f, const void *buf, uint32_t count)
114114
{
115115
while (count) {
116116
unsigned left = f->buffer_len - f->offset;

csum-file.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ void free_hashfile(struct hashfile *f);
6363
*/
6464
int finalize_hashfile(struct hashfile *, unsigned char *, enum fsync_component, unsigned int);
6565
void discard_hashfile(struct hashfile *);
66-
void hashwrite(struct hashfile *, const void *, unsigned int);
66+
void hashwrite(struct hashfile *, const void *, uint32_t);
6767
void hashflush(struct hashfile *f);
6868
void crc32_begin(struct hashfile *);
6969
uint32_t crc32_end(struct hashfile *);

hash.c

Lines changed: 45 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,49 @@ const char *empty_tree_oid_hex(const struct git_hash_algo *algop)
241241
return oid_to_hex_r(buf, algop->empty_tree);
242242
}
243243

244-
int hash_algo_by_name(const char *name)
244+
const struct git_hash_algo *hash_algo_ptr_by_number(uint32_t algo)
245+
{
246+
if (algo >= GIT_HASH_NALGOS)
247+
return NULL;
248+
return &hash_algos[algo];
249+
}
250+
251+
struct git_hash_ctx *git_hash_alloc(void)
252+
{
253+
return xmalloc(sizeof(struct git_hash_ctx));
254+
}
255+
256+
void git_hash_free(struct git_hash_ctx *ctx)
257+
{
258+
free(ctx);
259+
}
260+
261+
void git_hash_init(struct git_hash_ctx *ctx, const struct git_hash_algo *algop)
262+
{
263+
algop->init_fn(ctx);
264+
}
265+
266+
void git_hash_clone(struct git_hash_ctx *dst, const struct git_hash_ctx *src)
267+
{
268+
src->algop->clone_fn(dst, src);
269+
}
270+
271+
void git_hash_update(struct git_hash_ctx *ctx, const void *in, size_t len)
272+
{
273+
ctx->algop->update_fn(ctx, in, len);
274+
}
275+
276+
void git_hash_final(unsigned char *hash, struct git_hash_ctx *ctx)
277+
{
278+
ctx->algop->final_fn(hash, ctx);
279+
}
280+
281+
void git_hash_final_oid(struct object_id *oid, struct git_hash_ctx *ctx)
282+
{
283+
ctx->algop->final_oid_fn(oid, ctx);
284+
}
285+
286+
uint32_t hash_algo_by_name(const char *name)
245287
{
246288
if (!name)
247289
return GIT_HASH_UNKNOWN;
@@ -251,15 +293,15 @@ int hash_algo_by_name(const char *name)
251293
return GIT_HASH_UNKNOWN;
252294
}
253295

254-
int hash_algo_by_id(uint32_t format_id)
296+
uint32_t hash_algo_by_id(uint32_t format_id)
255297
{
256298
for (size_t i = 1; i < GIT_HASH_NALGOS; i++)
257299
if (format_id == hash_algos[i].format_id)
258300
return i;
259301
return GIT_HASH_UNKNOWN;
260302
}
261303

262-
int hash_algo_by_length(size_t len)
304+
uint32_t hash_algo_by_length(size_t len)
263305
{
264306
for (size_t i = 1; i < GIT_HASH_NALGOS; i++)
265307
if (len == hash_algos[i].rawsz)

hash.h

Lines changed: 13 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ static inline void git_SHA256_Clone(git_SHA256_CTX *dst, const git_SHA256_CTX *s
211211

212212
struct object_id {
213213
unsigned char hash[GIT_MAX_RAWSZ];
214-
int algo; /* XXX requires 4-byte alignment */
214+
uint32_t algo; /* XXX requires 4-byte alignment */
215215
};
216216

217217
#define GET_OID_QUIETLY 01
@@ -320,37 +320,25 @@ struct git_hash_algo {
320320
};
321321
extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
322322

323-
static inline void git_hash_clone(struct git_hash_ctx *dst, const struct git_hash_ctx *src)
324-
{
325-
src->algop->clone_fn(dst, src);
326-
}
327-
328-
static inline void git_hash_update(struct git_hash_ctx *ctx, const void *in, size_t len)
329-
{
330-
ctx->algop->update_fn(ctx, in, len);
331-
}
332-
333-
static inline void git_hash_final(unsigned char *hash, struct git_hash_ctx *ctx)
334-
{
335-
ctx->algop->final_fn(hash, ctx);
336-
}
337-
338-
static inline void git_hash_final_oid(struct object_id *oid, struct git_hash_ctx *ctx)
339-
{
340-
ctx->algop->final_oid_fn(oid, ctx);
341-
}
342-
323+
void git_hash_init(struct git_hash_ctx *ctx, const struct git_hash_algo *algop);
324+
void git_hash_clone(struct git_hash_ctx *dst, const struct git_hash_ctx *src);
325+
void git_hash_update(struct git_hash_ctx *ctx, const void *in, size_t len);
326+
void git_hash_final(unsigned char *hash, struct git_hash_ctx *ctx);
327+
void git_hash_final_oid(struct object_id *oid, struct git_hash_ctx *ctx);
328+
const struct git_hash_algo *hash_algo_ptr_by_number(uint32_t algo);
329+
struct git_hash_ctx *git_hash_alloc(void);
330+
void git_hash_free(struct git_hash_ctx *ctx);
343331
/*
344332
* Return a GIT_HASH_* constant based on the name. Returns GIT_HASH_UNKNOWN if
345333
* the name doesn't match a known algorithm.
346334
*/
347-
int hash_algo_by_name(const char *name);
335+
uint32_t hash_algo_by_name(const char *name);
348336
/* Identical, except based on the format ID. */
349-
int hash_algo_by_id(uint32_t format_id);
337+
uint32_t hash_algo_by_id(uint32_t format_id);
350338
/* Identical, except based on the length. */
351-
int hash_algo_by_length(size_t len);
339+
uint32_t hash_algo_by_length(size_t len);
352340
/* Identical, except for a pointer to struct git_hash_algo. */
353-
static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
341+
static inline uint32_t hash_algo_by_ptr(const struct git_hash_algo *p)
354342
{
355343
size_t i;
356344
for (i = 0; i < GIT_HASH_NALGOS; i++) {

object-file-convert.c

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,25 @@
1313
#include "gpg-interface.h"
1414
#include "object-file-convert.h"
1515

16-
int repo_oid_to_algop(struct repository *repo, const struct object_id *src,
16+
int repo_oid_to_algop(struct repository *repo, const struct object_id *srcoid,
1717
const struct git_hash_algo *to, struct object_id *dest)
1818
{
1919
/*
2020
* If the source algorithm is not set, then we're using the
2121
* default hash algorithm for that object.
2222
*/
2323
const struct git_hash_algo *from =
24-
src->algo ? &hash_algos[src->algo] : repo->hash_algo;
24+
srcoid->algo ? &hash_algos[srcoid->algo] : repo->hash_algo;
25+
struct object_id temp;
26+
const struct object_id *src = srcoid;
2527

26-
if (from == to) {
28+
if (!srcoid->algo) {
29+
oidcpy(&temp, srcoid);
30+
temp.algo = hash_algo_by_ptr(repo->hash_algo);
31+
src = &temp;
32+
}
33+
34+
if (from == to || !to) {
2735
if (src != dest)
2836
oidcpy(dest, src);
2937
return 0;

oidtree.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ struct oidtree_iter_data {
1010
oidtree_iter fn;
1111
void *arg;
1212
size_t *last_nibble_at;
13-
int algo;
13+
uint32_t algo;
1414
uint8_t last_byte;
1515
};
1616

repository.c

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
#include "repository.h"
44
#include "odb.h"
55
#include "config.h"
6+
#include "gettext.h"
67
#include "object.h"
78
#include "lockfile.h"
89
#include "path.h"
@@ -38,7 +39,7 @@ struct repository *the_repository = &the_repo;
3839
static void set_default_hash_algo(struct repository *repo)
3940
{
4041
const char *hash_name;
41-
int algo;
42+
uint32_t algo;
4243

4344
hash_name = getenv("GIT_TEST_DEFAULT_HASH_ALGO");
4445
if (!hash_name)
@@ -178,18 +179,23 @@ void repo_set_gitdir(struct repository *repo,
178179
repo->gitdir, "index");
179180
}
180181

181-
void repo_set_hash_algo(struct repository *repo, int hash_algo)
182+
void repo_set_hash_algo(struct repository *repo, uint32_t hash_algo)
182183
{
183184
repo->hash_algo = &hash_algos[hash_algo];
184185
}
185186

186-
void repo_set_compat_hash_algo(struct repository *repo, int algo)
187+
void repo_set_compat_hash_algo(struct repository *repo MAYBE_UNUSED, uint32_t algo)
187188
{
189+
#ifdef WITH_RUST
188190
if (hash_algo_by_ptr(repo->hash_algo) == algo)
189191
BUG("hash_algo and compat_hash_algo match");
190192
repo->compat_hash_algo = algo ? &hash_algos[algo] : NULL;
191193
if (repo->compat_hash_algo)
192194
repo_read_loose_object_map(repo);
195+
#else
196+
if (algo)
197+
die(_("compatibility hash algorithm support requires Rust"));
198+
#endif
193199
}
194200

195201
void repo_set_ref_storage_format(struct repository *repo,

0 commit comments

Comments
 (0)