Compression techniques #2416
infogulch
started this conversation in
Feature Requests
Replies: 1 comment 4 replies
-
ZStandard Random Access
This project is archived now (but it's C++ anyway so). But the idea is sound: store frame and source text offset metadata, and you can seek into the middle of a large body of compressed text. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I found some things that could improve compression in certain contexts:
Burrows-Wheeler Transform (BWT)
The algorithm is a bit odd, but after the transform, characters that appear frequently together in the original string tend to cluster together in the transformed string. This locality can be exploited by run-length encoding or other compression techniques. It can also be used as an index for fast searching 1.
A naive implementation of the algorithm: 1. Generate all possible rotations of the input string. 2. Sort these rotations lexicographically (alphabetically). 3. Extract the last character of each sorted rotation to form the transformed string.
Somewhat surprisingly this algorithm is actually reversible (!?). It's also related to the FM-index. 2 3
Footnotes
https://web.archive.org/web/20160304013343/http://blog.avadis-ngs.com/2012/04/elegant-exact-string-match-using-bwt-2/ ↩
https://cbloomrants.blogspot.com/2021/03/faster-inverse-bwt.html ↩
https://www.alexbowe.com/fm-index/ ↩
Beta Was this translation helpful? Give feedback.
All reactions