logpack

Logpack is specialized compressor for log files. On its own it provides modest compression. It is best used in in two stage scheme: Use logpack first, then compress its output with a dictionary based compressor (gzip, zstd). For example it reaches compression ratios ~42% better than that of zstd alone (and will do it faster and using < 1 MB of RAM).

It is based on the ideas presented in "Fast and efficient log file compression" paper by Przemysław Skibiński and Jakub Swacha.

Usage

Packing

To compress file.log into file.log.lp run the following in the terminal:

logpack file.log

Compression level between 1 and 9 can be selected. Higher numbers will result in better compression but slow speeds. It is done in the following way:

logpack -8 file.log

Unpacking

To unpack logpack archive file.log.lp run:

logpack -d file.log.lp

What is it good for exactly?

Logpack will yield decent compression ratio (between 2-10x size reduction) for anything that looks like log file, eg:

[Thu Jun 09 06:07:04 2005] [notice] LDAP: Built with OpenLDAP LDAP SDK
[Thu Jun 09 06:07:04 2005] [notice] LDAP: SSL support unavailable
[Thu Jun 09 06:07:04 2005] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Thu Jun 09 06:07:05 2005] [notice] Digest: generating secret for digest authentication ...
[Thu Jun 09 06:07:05 2005] [notice] Digest: done
[Thu Jun 09 06:07:05 2005] [notice] LDAP: Built with OpenLDAP LDAP SDK
[Thu Jun 09 06:07:05 2005] [notice] LDAP: SSL support unavailable
[Thu Jun 09 06:07:05 2005] [error] env.createBean2(): Factory error creating channel.jni:jni ( channel.jni, jni)
[Thu Jun 09 06:07:05 2005] [error] config.update(): Can't create channel.jni:jni
[Thu Jun 09 06:07:05 2005] [error] env.createBean2(): Factory error creating vm: ( vm, )
[Thu Jun 09 06:07:05 2005] [error] config.update(): Can't create vm:
[Thu Jun 09 06:07:05 2005] [error] env.createBean2(): Factory error creating worker.jni:onStartup ( worker.jni, onStartup)

Basically any repetitive text that contains space ' ' and newline \n bytes is good.

This makes it applicable to:

Any contemporary text encodings (ASCII and derivative code pages, UTF-8, UTF-16 regardless of endianness)
Windows/Linux/MacOS line endings
Lines of reasonable length (smaller than 65kB)

Furthermore logpack is designed to swallow arbitrary input without crashing. So accidental oversized line or binary file should not be a problem. Such data will increase in size though¹.

Benchmark results

Here are results of compression of samples from loghub corpus ² . See here how to run benchmark.

compression level: 4

	pack[MB/s]	unpack[MB/s]	compr. ratio [input/output]
android_v1	66.67	347.32	2.092
android_v2	78.63	337.74	2.410
apache	245.17	249.22	9.021
blue_gene_l	319.91	345.67	5.259
hadoop	83.37	551.04	1.495
hdfs_v1	114.34	348.95	3.814
hdfs_v2	435.96	385.19	5.363
health_app	62.61	483.31	1.571
hpc	306.87	227.57	7.522
linux	168.08	247.68	5.740
mac	101.60	351.31	2.787
open_stack	171.36	433.91	3.535
proxifier	113.72	280.87	5.299
spark	145.60	247.81	7.681
ssh	202.06	261.07	8.640
thunderbird	124.81	271.41	4.826
windows	388.81	427.76	6.052
zookeeper	176.31	276.68	10.71

compression level: 6

	pack[MB/s]	unpack[MB/s]	compr. ratio [input/output]
android_v1	25.33	335.98	2.408
android_v2	32.49	330.09	2.704
apache	152.72	234.53	10.35
blue_gene_l	181.88	356.49	5.309
hadoop	39.90	343.00	8.183
hdfs_v1	56.38	328.38	4.295
hdfs_v2	395.68	378.28	5.637
health_app	18.93	493.63	1.610
hpc	236.71	224.24	7.626
linux	77.64	235.67	6.275
mac	40.67	336.59	3.343
open_stack	69.43	423.72	3.844
proxifier	49.94	262.33	6.373
spark	142.24	241.00	8.399
ssh	179.40	257.39	9.039
thunderbird	52.03	256.46	5.479
windows	226.41	421.41	6.164
zookeeper	161.30	261.91	11.99

compression level: 9

	pack[MB/s]	unpack[MB/s]	compr. ratio [input/output]
android_v1	16.55	326.64	2.436
android_v2	18.18	320.53	2.836
apache	31.87	222.97	11.43
blue_gene_l	20.55	326.14	5.795
hadoop	33.39	319.96	8.932
hdfs_v1	17.54	307.47	4.969
hdfs_v2	43.48	324.86	7.825
health_app	17.41	480.17	1.610
hpc	16.66	213.80	8.346
linux	17.33	235.57	6.471
mac	22.83	328.80	3.523
open_stack	39.55	424.49	3.865
proxifier	23.32	250.17	6.751
spark	19.01	217.27	9.907
ssh	20.13	249.95	9.627
thunderbird	23.62	251.71	5.608
windows	70.37	420.84	6.254
zookeeper	19.79	254.07	12.69

Compression improvement over ZSTD

Compression ratios mean input_size / output_size. All speeds are in MB/s (1MB = 10^6 Bytes ).

logpack level 4

	zstd pack speed	zstd compr ratio	lp(4)+zstd pack speed	lp(4)+zstd compr ratio	compr improvement
android_v1	265.62	13.48	62.90	15.98	1.185
android_v2	270.41	13.87	73.84	17.69	1.275
apache	371.79	21.95	209.14	31.76	1.447
blue_gene_l	232.00	9.170	212.71	17.50	1.908
hadoop	1739.27	178.8	83.79	165.0	0.9228
hdfs_v1	256.56	12.57	100.63	15.43	1.227
hdfs_v2	309.94	19.13	282.10	24.39	1.275
health_app	245.36	11.02	54.24	12.63	1.146
hpc	278.24	12.29	223.81	22.94	1.867
linux	281.24	13.13	137.19	19.54	1.488
mac	320.29	20.11	91.78	26.28	1.307
open_stack	256.87	12.27	135.63	15.35	1.251
proxifier	296.26	15.97	100.79	18.26	1.143
spark	309.13	16.65	136.14	23.86	1.433
ssh	299.59	17.51	176.40	33.98	1.941
thunderbird	314.19	17.64	118.56	31.39	1.780
windows	783.92	69.45	343.43	102.8	1.480
zookeeper	364.88	25.94	171.24	45.50	1.754
CORPUS_TOTAL	----	16.14	----	22.91	1.419

logpack level 9

	zstd pack speed	zstd compr ratio	lp(9)+zstd pack speed	lp(9)+zstd compr ratio	compr improvement
android_v1	265.62	13.48	17.85	15.71	1.166
android_v2	270.41	13.87	19.58	17.47	1.259
apache	371.79	21.95	33.92	30.93	1.409
blue_gene_l	232.00	9.170	21.68	17.63	1.923
hadoop	1739.27	178.8	34.56	211.3	1.182
hdfs_v1	256.56	12.57	20.00	16.05	1.277
hdfs_v2	309.94	19.13	44.90	24.88	1.300
health_app	245.36	11.02	17.18	12.40	1.125
hpc	278.24	12.29	17.19	22.47	1.828
linux	281.24	13.13	18.29	19.28	1.469
mac	320.29	20.11	23.97	25.63	1.275
open_stack	256.87	12.27	40.80	15.43	1.257
proxifier	296.26	15.97	24.72	18.65	1.168
spark	309.13	16.65	21.09	25.19	1.512
ssh	299.59	17.51	21.57	34.23	1.955
thunderbird	314.19	17.64	25.39	30.22	1.713
windows	783.92	69.45	74.65	100.9	1.453
zookeeper	364.88	25.94	22.30	46.81	1.805
CORPUS_TOTAL	----	16.14	----	22.95	1.422

Build and test

All the commands to be run from the repo root.

Build executable:

go build .

Run tests:

go test .\pack -v

Run benchmarks:

Benchmarks

go test ./pack -v -run=ThisRegexMatchesNoTest  -bench=Packing$

Pit it against zstd:

go test ./pack -v -run=ThisRegexMatchesNoTest  -bench=Zstd$

Need something better?

If logpack does not cut it you may be interested in LogpackPro. Here are few of it's highlights:

Even better compression
Streaming compression mode (compress logs in real time as they appear)
Up to several times faster thanks to usage of SIMD instructions and other optimizations
Available as a standalone program and C static library

Interested? Send me business inquiries on my LinkedIn (link in my GitHub profile)

Worst case compression doubles the size for typical input size. However the extreme case of compressing 1 non-ASCII byte will produce 6 bytes. The best case compression ratio is ~56x decrease. ↩
The results are based on <=10 MB samples from [LogHub corpus] (https://github.com/logpai/loghub/tree/master). ↩

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
pack		pack
testData		testData
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
go.mod		go.mod
go.sum		go.sum
main.go		main.go
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

logpack

Usage

Packing

Unpacking

What is it good for exactly?

Benchmark results

compression level: 4

compression level: 6

compression level: 9

Compression improvement over ZSTD

logpack level 4

logpack level 9

Build and test

Build executable:

Run tests:

Run benchmarks:

Need something better?

About

Uh oh!

Releases 1

Packages

Languages

License

macsmol/logpack

Folders and files

Latest commit

History

Repository files navigation

logpack

Usage

Packing

Unpacking

What is it good for exactly?

Benchmark results

compression level: 4

compression level: 6

compression level: 9

Compression improvement over ZSTD

logpack level 4

logpack level 9

Build and test

Build executable:

Run tests:

Run benchmarks:

Need something better?

Footnotes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages