Skip to content

Commit 05b7aaa

Browse files
committed
Add post! Rust BufWriter and LZ4 Compression
1 parent f52b635 commit 05b7aaa

File tree

1 file changed

+114
-0
lines changed

1 file changed

+114
-0
lines changed

_posts/2025-07-09-bufwriter-lz4.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
---
2+
title: Rust BufWriter and LZ4 Compression
3+
categories:
4+
- rust
5+
discussions:
6+
---
7+
Recently, I've been working on a Rust project again.
8+
It deals with bioinformatics data, which can be quite large,
9+
so I got to play with profiling and optimizing the code.
10+
I've done some of this in past, but this time it was _actually_ useful.
11+
In this post, I want to talk about a small optimization
12+
in working with LZ4 compression
13+
that made a big difference in runtime performance.
14+
15+
This tool mainly reads in a BAM file
16+
(which contains aligned genome sequence data),
17+
does some processing on it,
18+
and outputs the results in various formats,
19+
chosen by the user.
20+
One of the formats is the internal data structure used by the tool,
21+
which is convient for debugging and testing.
22+
Since this is Rust, all I had to do was add some `#[derive(Serialize, Deserialize)]` annotations,
23+
choose a good format (I picked [MessagePack](https://msgpack.org/)),
24+
and thanks to [serde](https://serde.rs/),
25+
we have a data format.
26+
Concretely, I made an enum with all the possible structures I want to output,
27+
(which includes header fields)
28+
and serialize and write each structure separately,
29+
so that they are concatenated in the output file.
30+
To read it back in,
31+
I wrote a little helper function[^rmp-stream] that
32+
keeps deserializing these enum values until it reaches the end of the file.
33+
So far, so good.
34+
35+
## Compression with LZ4
36+
37+
However, the output file was quite large
38+
-- it's pretty much everything I have in RAM.
39+
I wanted to compress it,
40+
but I also knew that compression is expensive,
41+
and for my debug output I don't really need to squeeze every byte out of it.
42+
I chose [LZ4](https://lz4.github.io/lz4/), via the `lz4` crate.
43+
Its [`Encoder`](https://docs.rs/lz4/1.28.1/lz4/struct.Encoder.html)
44+
implements `Write`,
45+
so we can just wrap our writer in it and continue to use it as before:
46+
47+
```rust
48+
let file = std::fs::File::create("output.msgpack.lz4")?;
49+
let encoder = lz4::EncoderBuilder::new().level(4).build(file)?;
50+
```
51+
52+
Pretty early in my Rust journey,
53+
I learned that file I/O is not buffered by default,
54+
so it's a good idea to wrap the `file` in a `BufWriter`:
55+
56+
```rust
57+
let file = std::fs::File::create("output.msgpack.lz4")?;
58+
let file_buffered = std::io::BufWriter::new(file);
59+
let encoder = lz4::EncoderBuilder::new().level(4).build(file_buffered)?;
60+
```
61+
62+
This then creates a chain like this:
63+
64+
```
65+
MessagePack Serializer -> LZ4 Encoder -> BufWriter -> File
66+
```
67+
68+
## Profiling
69+
70+
When profiling the code (with [samply](https://github.com/mstange/samply/)),
71+
I noticed that the overhead from LZ4 was quite high.
72+
Even after lowering the compression level to 0,
73+
I wasn't happy.
74+
This was slower than the BGZIP compression I use for BCF files!
75+
And that is based on Deflate, which, while optimized heavily,
76+
is not an algorithm that should play in the same league as LZ4.
77+
What is going on here?
78+
79+
I saw that there were **many** stacks with calls to `LZ4F_compressUpdateImpl`.
80+
Looking at [the implementation](https://github.com/lz4/lz4/blob/v1.10.0/lib/lz4frame.c#L977)
81+
with the samples per line,
82+
I see a lot of calls to `LZ4F_selectCompression`, `LZ4F_compressBound_internal`,
83+
`memcpy` (if the temporary block buffer has space and LZ4 wants to buffer),
84+
`LZ4F_makeBlock`, which writes the block header and checksum,
85+
and finally `XXH32_update`, which computes the checksum for the block.
86+
Why is this being called so much and why are there so many blocks being made?
87+
88+
LZ4 is a block-based compression algorithm,
89+
which means that it compresses data in chunks.
90+
The chunks we are giving it are the serialized MessagePack data,
91+
which is around 250 bytes each.
92+
This means that for every 250 byte chunk,
93+
we're calling calling into LZ4 and ask it to compress it.
94+
And for every 250 byte chunk,
95+
it does the entire round checks and compression, and checksumming.
96+
97+
## Swap the buffer
98+
99+
Knowing that LZ4 works with blocks internally,
100+
I had the idea that I could swap the way I use the buffer:
101+
Instead of buffering writing to the file system,
102+
I could buffer writing to the LZ4 encoder.
103+
104+
```rust
105+
let file = std::fs::File::create("output.msgpack.lz4")?;
106+
let encoder = lz4::EncoderBuilder::new().level(4).build(file)?;
107+
let encoder_buffered = std::io::BufWriter::new(encoder);
108+
```
109+
110+
And indeed, this works!
111+
In my initial benchmark, this made this part of the code 1.83 times faster.
112+
An amazing result for basically just swapping two lines of code.
113+
114+
[^rmp-stream]: `serde_json` includes a [`StreamDeserializer`](https://docs.rs/serde_json/1.0.140/serde_json/struct.StreamDeserializer.html) but `rmp_serde` does not, so I wrote one myself. It's not as feature-complete (I think), but you can find it [here](https://github.com/3Hren/msgpack-rust/issues/317#issuecomment-3012814957).

0 commit comments

Comments
 (0)