Skip to content

Commit ddaf1f6

Browse files
derrickstoleegitster
authored andcommitted
csum-file: make hashwrite() more readable
The hashwrite() method takes an input buffer and updates a hashfile's hash function while writing the data to a file. To avoid overuse of flushes, the hashfile has an internal buffer and most writes will use memcpy() to transfer data from the input 'buf' to the hashfile's buffer of size 8 * 1024 bytes. Logic introduced by a8032d1 (sha1write: don't copy full sized buffers, 2008-09-02) reduces the number of memcpy() calls when the input buffer is sufficiently longer than the hashfile's buffer, causing nr to be the length of the full buffer. In these cases, the input buffer is used directly in chunks equal to the hashfile's buffer size. This method caught my attention while investigating some performance issues, but it turns out that these performance issues were noise within the variance of the experiment. However, during this investigation, I inspected hashwrite() and misunderstood it, even after looking closely and trying to make it faster. This change simply reorganizes some parts of the loop within hashwrite() to make it clear that each batch either uses memcpy() to the hashfile's buffer or writes directly from the input buffer. The previous code relied on indirection through local variables and essentially inlined the implementation of hashflush() to reduce lines of code. Helped-by: Jeff King <[email protected]> Helped-by: Junio C Hamano <[email protected]> Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent a5828ae commit ddaf1f6

File tree

1 file changed

+18
-15
lines changed

1 file changed

+18
-15
lines changed

csum-file.c

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -89,32 +89,35 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int fl
8989
void hashwrite(struct hashfile *f, const void *buf, unsigned int count)
9090
{
9191
while (count) {
92-
unsigned offset = f->offset;
93-
unsigned left = sizeof(f->buffer) - offset;
92+
unsigned left = sizeof(f->buffer) - f->offset;
9493
unsigned nr = count > left ? left : count;
95-
const void *data;
9694

9795
if (f->do_crc)
9896
f->crc32 = crc32(f->crc32, buf, nr);
9997

10098
if (nr == sizeof(f->buffer)) {
101-
/* process full buffer directly without copy */
102-
data = buf;
99+
/*
100+
* Flush a full batch worth of data directly
101+
* from the input, skipping the memcpy() to
102+
* the hashfile's buffer. In this block,
103+
* f->offset is necessarily zero.
104+
*/
105+
the_hash_algo->update_fn(&f->ctx, buf, nr);
106+
flush(f, buf, nr);
103107
} else {
104-
memcpy(f->buffer + offset, buf, nr);
105-
data = f->buffer;
108+
/*
109+
* Copy to the hashfile's buffer, flushing only
110+
* if it became full.
111+
*/
112+
memcpy(f->buffer + f->offset, buf, nr);
113+
f->offset += nr;
114+
left -= nr;
115+
if (!left)
116+
hashflush(f);
106117
}
107118

108119
count -= nr;
109-
offset += nr;
110120
buf = (char *) buf + nr;
111-
left -= nr;
112-
if (!left) {
113-
the_hash_algo->update_fn(&f->ctx, data, offset);
114-
flush(f, data, offset);
115-
offset = 0;
116-
}
117-
f->offset = offset;
118121
}
119122
}
120123

0 commit comments

Comments
 (0)