-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Description
Hi, I was looking at the chunking provided by Zpaq in this crate (as part of comparing it to a zpaq implimentation I've written), and noticed that it doesn't appear to match up with the upstream zpaq code.
In particular, it appears that the hash update equation doesn't match up (though I might be missing some transform you've done).
const HM: Wrapping<u32> = Wrapping(123_456_791);
...
pub fn update(&mut self, byte: u8) -> bool {
if byte == self.o1[self.c1 as usize] {
self.h = self.h * HM + Wrapping(byte as u32 + 1);
} else {
self.h = self.h * HM * Wrapping(2) + Wrapping(byte as u32 + 1);
}
self.o1[self.c1 as usize] = byte;
self.c1 = byte;
self.h.0 < (1 << self.nbits)
}
if (c==o1[c1]) h=(h+c+1)*314159265u, ++hits;
else h=(h+c+1)*271828182u;
o1[c1]=c;
c1=c;
In particular, zpaq adds h
, c
, and 1
prior to multiplying (and uses different constants), but cdchunking adds c
and 1
after h
is mulitiplied.
I've done some slight modification to zpaq itself so that it prints chunk lengths, and I'm not seeing a correspondence between the lengths produced by zpaq and cdchunking.
diff --git a/zpaq.cpp b/zpaq.cpp
index f468a7b..84e1f5f 100644
--- a/zpaq.cpp
+++ b/zpaq.cpp
@@ -83,6 +83,7 @@ Possible options:
#endif
#ifdef unix
#define PTHREAD 1
+#include <inttypes.h>
#include <sys/param.h>
#include <sys/types.h>
#include <sys/stat.h>
@@ -2416,6 +2417,8 @@ int Jidac::add() {
assert(sz<=MAX_FRAGMENT);
total_done+=sz;
+ printf("## %" PRIu64 "\n", sz);
+
// Look for matching fragment
assert(uint64_t(sz)==sha1.usize());
memcpy(sha1result, sha1.result(), 20);
- Is this difference in algorithm an oversight?
- Or is it purposefully different?
- If it is purposefully different, is it following another already published source?
- If it is purposefully different, could the divergence be documented?