Compare fsst java implementation against lz4 #130830

parkertimmins · 2025-07-08T15:04:46Z

No description provided.

parkertimmins · 2025-07-08T15:10:21Z

Ran some benchmarks using datasets from paper: https://github.com/cwida/fsst/tree/master/paper/dbtext
Notable difference if that fsst compression times are slower. This is because I have (so far) been unable to make a decent java SIMD implementation of fsst.

One caveat: I didn't include the symbol table in the fsst compressed size. The table is around 500 bytes. But since the smallest dataset if 133k, and many are >MB, this doesn't bias the results too much.

	Comp Factor	Comp Factor	Comp (ms)	Comp (ms)	Decomp (ms)	Decomp (ms)
(dataset)	fsst	lz4_fast	fsst	lz4_fast	fsst	lz4_fast
c_name	4.07	3.19	4.50	3.95	0.44	1.96
chinese	1.68	1.39	5.35	3.18	0.36	1.66
city	1.95	1.36	2.45	0.58	0.05	0.30
credentials	2.09	1.45	2.37	0.77	0.07	0.34
email	2.05	1.53	8.69	6.73	0.80	4.06
faust	1.82	1.43	3.35	1.67	0.17	0.76
firstname	1.86	1.19	3.46	1.82	0.20	0.98
genome	2.89	1.42	3.50	3.51	0.35	1.28
hamlet	2.22	2.15	2.68	1.00	0.11	0.48
hex	1.85	1.04	3.45	2.26	0.46	1.20
japanese	2.00	1.70	2.62	0.95	0.10	0.42
l_comment	2.75	2.16	10.81	8.69	0.77	4.27
lastname	1.79	1.24	10.06	9.01	1.03	5.17
location	2.69	1.57	7.06	11.66	0.78	6.52
movies	1.59	1.17	9.03	7.39	1.06	4.07
ps_comment	3.27	2.58	8.45	5.56	0.68	3.26
street	2.18	1.66	2.19	0.61	0.05	0.28
urls	2.35	2.74	18.51	14.56	2.10	8.29
urls2	1.97	1.70	7.62	4.44	0.69	2.50
uuid	2.33	1.50	11.61	9.54	1.35	5.38
wiki	1.56	1.25	10.13	7.72	1.15	4.28
wikipedia	1.81	1.43	14.03	13.44	1.28	6.79
yago	1.54	1.20	8.21	6.44	0.97	3.56

parkertimmins · 2025-07-08T18:48:33Z

This original benchmarks from the paper concatenated each file with itself repeated so that the input file is 8MB. I reran the benchmarks with this setup. Also changed the compression and decompression times to throughput so it could be more easily compared with the paper.

Though the lz4 compression is still faster on many datasets, these results look more equal.

	Comp Factor	Comp Factor	Comp (mb/s)	Comp (mb/s)	Decomp (mb/s)	Decomp (mb/s)
(dataset)	fsst	lz4_fast	fsst	lz4_fast	fsst	lz4_fast
c_name	4.05	3.19	494	447	4457	843
chinese	1.68	1.39	210	224	2111	418
city	1.96	1.37	268	205	2487	406
credentials	2.13	1.47	220	174	2706	419
email	2.02	1.53	218	317	2588	511
faust	1.81	1.44	232	179	2261	385
firstname	1.85	1.19	266	243	2265	403
genome	2.87	1.42	372	271	3190	728
hamlet	2.22	2.16	284	253	2787	522
hex	1.85	1.04	551	353	2043	676
japanese	2.00	1.71	208	207	2481	443
l_comment	2.73	2.16	267	302	3413	581
lastname	1.81	1.24	277	260	2207	442
location	2.67	1.57	446	225	3404	385
movies	1.60	1.17	264	263	1886	478
ps_comment	3.36	2.58	337	405	3723	648
street	2.20	1.68	250	204	2762	453
urls	2.35	2.74	331	417	2865	715
urls2	1.99	1.70	247	347	2451	618
uuid	2.34	1.50	329	369	2609	611
wiki	1.58	1.25	262	287	1922	508
wikipedia	1.81	1.43	217	202	2192	394
yago	1.56	1.20	267	281	1886	495

Commit fsst against lucene lz4 impl

3e880a3

elasticsearchmachine added the v9.2.0 label Jul 8, 2025

[CI] Auto commit changes from spotless

2cf3c79

parkertimmins and others added 2 commits July 8, 2025 13:55

make each input file 8mb

64f9cda

[CI] Auto commit changes from spotless

be6cb77

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compare fsst java implementation against lz4 #130830

Compare fsst java implementation against lz4 #130830

Uh oh!

parkertimmins commented Jul 8, 2025

Uh oh!

parkertimmins commented Jul 8, 2025 •

edited

Loading

Uh oh!

parkertimmins commented Jul 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Compare fsst java implementation against lz4 #130830

Are you sure you want to change the base?

Compare fsst java implementation against lz4 #130830

Uh oh!

Conversation

parkertimmins commented Jul 8, 2025

Uh oh!

parkertimmins commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

parkertimmins commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

parkertimmins commented Jul 8, 2025 •

edited

Loading

parkertimmins commented Jul 8, 2025 •

edited

Loading