Skip to content

zpaq algorithm does not appear to match impl in zpaq upstream #6

@codyps

Description

@codyps

Hi, I was looking at the chunking provided by Zpaq in this crate (as part of comparing it to a zpaq implimentation I've written), and noticed that it doesn't appear to match up with the upstream zpaq code.

In particular, it appears that the hash update equation doesn't match up (though I might be missing some transform you've done).

cdchunking code:

const HM: Wrapping<u32> = Wrapping(123_456_791);
...
pub fn update(&mut self, byte: u8) -> bool {
        if byte == self.o1[self.c1 as usize] {
            self.h = self.h * HM + Wrapping(byte as u32 + 1);
        } else {
            self.h = self.h * HM * Wrapping(2) + Wrapping(byte as u32 + 1);
        }
        self.o1[self.c1 as usize] = byte;
        self.c1 = byte;

        self.h.0 < (1 << self.nbits)
}

zpaq code

            if (c==o1[c1]) h=(h+c+1)*314159265u, ++hits;
            else h=(h+c+1)*271828182u;
            o1[c1]=c;
            c1=c;

In particular, zpaq adds h, c, and 1 prior to multiplying (and uses different constants), but cdchunking adds c and 1 after h is mulitiplied.

I've done some slight modification to zpaq itself so that it prints chunk lengths, and I'm not seeing a correspondence between the lengths produced by zpaq and cdchunking.

diff --git a/zpaq.cpp b/zpaq.cpp
index f468a7b..84e1f5f 100644
--- a/zpaq.cpp
+++ b/zpaq.cpp
@@ -83,6 +83,7 @@ Possible options:
 #endif
 #ifdef unix
 #define PTHREAD 1
+#include <inttypes.h>
 #include <sys/param.h>
 #include <sys/types.h>
 #include <sys/stat.h>
@@ -2416,6 +2417,8 @@ int Jidac::add() {
         assert(sz<=MAX_FRAGMENT);
         total_done+=sz;
 
+       printf("## %" PRIu64 "\n", sz);
+
         // Look for matching fragment
         assert(uint64_t(sz)==sha1.usize());
         memcpy(sha1result, sha1.result(), 20);
  • Is this difference in algorithm an oversight?
  • Or is it purposefully different?
  • If it is purposefully different, is it following another already published source?
  • If it is purposefully different, could the divergence be documented?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions