Skip to content

Commit b5265c8

Browse files
daverodgmantorvalds
authored andcommitted
lib/lzo: fix ambiguous encoding bug in lzo-rle
In some rare cases, for input data over 32 KB, lzo-rle could encode two different inputs to the same compressed representation, so that decompression is then ambiguous (i.e. data may be corrupted - although zram is not affected because it operates over 4 KB pages). This modifies the compressor without changing the decompressor or the bitstream format, such that: - there is no change to how data produced by the old compressor is decompressed - an old decompressor will correctly decode data from the updated compressor - performance and compression ratio are not affected - we avoid introducing a new bitstream format In testing over 12.8M real-world files totalling 903 GB, three files were affected by this bug. I also constructed 37M semi-random 64 KB files totalling 2.27 TB, and saw no affected files. Finally I tested over files constructed to contain each of the ~1024 possible bad input sequences; for all of these cases, updated lzo-rle worked correctly. There is no significant impact to performance or compression ratio. Signed-off-by: Dave Rodgman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Dave Rodgman <[email protected]> Cc: Willy Tarreau <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Markus F.X.J. Oberhumer <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Chao Yu <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
1 parent fce1aff commit b5265c8

File tree

2 files changed

+19
-2
lines changed

2 files changed

+19
-2
lines changed

Documentation/lzo.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -159,11 +159,15 @@ Byte sequences
159159
distance = 16384 + (H << 14) + D
160160
state = S (copy S literals after this block)
161161
End of stream is reached if distance == 16384
162+
In version 1 only, to prevent ambiguity with the RLE case when
163+
((distance & 0x803f) == 0x803f) && (261 <= length <= 264), the
164+
compressor must not emit block copies where distance and length
165+
meet these conditions.
162166

163167
In version 1 only, this instruction is also used to encode a run of
164-
zeros if distance = 0xbfff, i.e. H = 1 and the D bits are all 1.
168+
zeros if distance = 0xbfff, i.e. H = 1 and the D bits are all 1.
165169
In this case, it is followed by a fourth byte, X.
166-
run length = ((X << 3) | (0 0 0 0 0 L L L)) + 4.
170+
run length = ((X << 3) | (0 0 0 0 0 L L L)) + 4
167171

168172
0 0 1 L L L L L (32..63)
169173
Copy of small block within 16kB distance (preferably less than 34B)

lib/lzo/lzo1x_compress.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,19 @@ lzo1x_1_do_compress(const unsigned char *in, size_t in_len,
268268
*op++ = (M4_MARKER | ((m_off >> 11) & 8)
269269
| (m_len - 2));
270270
else {
271+
if (unlikely(((m_off & 0x403f) == 0x403f)
272+
&& (m_len >= 261)
273+
&& (m_len <= 264))
274+
&& likely(bitstream_version)) {
275+
// Under lzo-rle, block copies
276+
// for 261 <= length <= 264 and
277+
// (distance & 0x80f3) == 0x80f3
278+
// can result in ambiguous
279+
// output. Adjust length
280+
// to 260 to prevent ambiguity.
281+
ip -= m_len - 260;
282+
m_len = 260;
283+
}
271284
m_len -= M4_MAX_LEN;
272285
*op++ = (M4_MARKER | ((m_off >> 11) & 8));
273286
while (unlikely(m_len > 255)) {

0 commit comments

Comments
 (0)