Skip to content

Allow full unfiltering for partial data (enum-based approach)#664

Merged
197g merged 3 commits intoimage-rs:masterfrom
anforowicz:partial-decode-december-2025
Feb 14, 2026
Merged

Allow full unfiltering for partial data (enum-based approach)#664
197g merged 3 commits intoimage-rs:masterfrom
anforowicz:partial-decode-december-2025

Conversation

@anforowicz
Copy link
Contributor

@anforowicz anforowicz commented Dec 10, 2025

Hello!

This PR fixes #639. This is an alternative approach to @197g's #640 - the two main differences are:

  • prev_start: usize is replaced with prev_row: PrevRow where PrevRow is an enum with separate None, InPlace, and Scratch variants. I think this results in easier to read code then imposing a special meaning on certain values of offsets (e.g. self.prev_start == self.curr_start => no previous row; or having self.prev_start sometimes trail self.curr_start by rowlen-1 and sometimes point to mutable scratch space in the same buffer as curr_start).
  • There is a test and a fix for consistently detecting errors in the tail portion of zlib stream (see the discussion at Allow full unfiltering for partial data #640 (comment)). This part may conflict with Have UnfilteringBuffer track number of remaining bytes in the frame #662 where @fintelia proposes to always ignore the tail zlib bytes (rather than as in this PR, to always decode them to detect errors). I think I should wait until @fintelia's PR lands, and then rebase on top of it (removing changes in this PR that won't be necessary afterwards).

PTAL?

@anforowicz
Copy link
Contributor Author

There are some CIFuzz failures, but they seem to be caused by some infrastructure problems (E: Package 'python' has no installation candidate) rather than by this PR? FWIW I've been running cargo fuzz run buf_independent for the last hour or so on my local machine and it didn't find any issues so far.

@197g
Copy link
Member

197g commented Dec 10, 2025

Did you use the corpus from oss-fuzz? For what it's worth the fuzz failure on the other PR is also somewhat spurious, it's a highly specific condition and fuzzing locally never found the same fault for me either (even after 64-threads for a day, it's just peanuts).

@anforowicz
Copy link
Contributor Author

Did you use the corpus from oss-fuzz?

I've just run cargo fuzz run buf_independent and when that didn't find anything for a while, I've added --jobs 60.

Would you have instructions for using oss-fuzz? (We may want to stash them into fuzz/README.md.)

For what it's worth the fuzz failure on the other PR is also somewhat spurious, it's a highly specific condition and fuzzing locally never found the same fault for me either (even after 64-threads for a day, it's just peanuts).

Without changing src/decoder/read_decoder.rs I was able to find the discrepancy between baseline_reader and intermittent_eofs_reader within 5-10 minutes of fuzzing locally.

@197g
Copy link
Member

197g commented Dec 10, 2025

Would you have instructions for using oss-fuzz? (We may want to stash them into fuzz/README.md.)

Sorry, I've always relied on the API job for this and hoped you'd be closer to the source 😄 . For some concrete failures there are the failures on the mailing list but I would like to know how to retrieve the full corpus myself.

@197g
Copy link
Member

197g commented Dec 11, 2025

Update: the fuzzer results are now running and look on first site like a genuine detection.

2025-12-11T16:06:58.4416156Z     #0 0x56378293a841 in __sanitizer_print_stack_trace /rustc/llvm/src/llvm-project/compiler-rt/lib/asan/asan_stack.cpp:87:3
2025-12-11T16:06:58.4430239Z     #1 0x563782a51ba8 in fuzzer::PrintStackTrace() /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerUtil.cpp:210:5
2025-12-11T16:06:58.4431699Z     #2 0x563782a3566b in fuzzer::Fuzzer::AlarmCallback() /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:304:5
2025-12-11T16:06:58.4432650Z     #3 0x7f9a7bd3d32f  (/lib/x86_64-linux-gnu/libc.so.6+0x4532f) (BuildId: 274eec488d230825a136fa9c4d85370fed7a0a5e)
2025-12-11T16:06:58.4433501Z     #4 0x5637829fef63 in fdeflate::decompress::Decompressor::read::h5122298fcea00f39 /rust/registry/src/index.crates.io-6f17d22bba15001f/fdeflate-0.3.7/src/decompress.rs
2025-12-11T16:06:58.4434526Z     #5 0x56378297e455 in png::decoder::zlib::UnfilterBuf::decompress::h9f4dbf88ef0f2ad9 /src/image-png/src/decoder/zlib.rs:199:43

The first assert I see for that method is assert!(output_position <= output.len());

Copy link
Member

@197g 197g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the fuzzer finding of panic in zlib is fixed by a rebase on #662 although I don't understand how it is getting triggered in the first place. It'd be ironic if the enum variants introduced for following the state actually lead to more possible state corruption.

Comment on lines +328 to +335
fn as_slice<'a>(&'a self, buf: &'a [u8]) -> &'a [u8] {
match self {
PrevRow::None => &[],
PrevRow::InPlace(prev_start) => &buf[*prev_start..],
PrevRow::Scratch(buf) => buf.as_slice(),
}
}
}
Copy link
Member

@197g 197g Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro optimization is the root of all evil

"Yet we should not pass up our opportunities in that critical 3%", to complete the quote. There's a bit of cost I'm can accept to accommodate best-efforts on broken images and incremental encoding but given that this isn't the use case for a lot of our other users, not too much. This shouldn't even really appear in the executed code path for any of our benchmarks that supply the full images in memory.

I'm also sorry to say it is far from micro on non-compressed streams in so far as I can tell. Granted the majority is rather neutral but this is all end-to-end and the zlib decompression masks a lot of other costs when it is involved.

Details
Benchmarking decode/tango-icon-address-book-new-16.png
Benchmarking decode/tango-icon-address-book-new-16.png: Warming up for 3.0000 s
Benchmarking decode/tango-icon-address-book-new-16.png: Collecting 10 samples in estimated 5.0001 s (1.1M iterations)
Benchmarking decode/tango-icon-address-book-new-16.png: Analyzing
decode/tango-icon-address-book-new-16.png
                        time:   [4.5412 µs 4.5419 µs 4.5426 µs]
                        thrpt:  [214.98 MiB/s 215.01 MiB/s 215.04 MiB/s]
                 change:
                        time:   [−1.6796% −1.6476% −1.6182%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6449% +1.6752% +1.7083%]
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
Benchmarking decode/tango-icon-address-book-new-32.png
Benchmarking decode/tango-icon-address-book-new-32.png: Warming up for 3.0000 s
Benchmarking decode/tango-icon-address-book-new-32.png: Collecting 10 samples in estimated 5.0002 s (652k iterations)
Benchmarking decode/tango-icon-address-book-new-32.png: Analyzing
decode/tango-icon-address-book-new-32.png
                        time:   [7.6328 µs 7.6330 µs 7.6333 µs]
                        thrpt:  [511.74 MiB/s 511.76 MiB/s 511.77 MiB/s]
                 change:
                        time:   [−3.8710% −3.8472% −3.8219%] (p = 0.00 < 0.05)
                        thrpt:  [+3.9738% +4.0011% +4.0269%]
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benchmarking generated-noncompressed-4k-idat/8x8.png
Benchmarking generated-noncompressed-4k-idat/8x8.png: Warming up for 3.0000 s
Benchmarking generated-noncompressed-4k-idat/8x8.png: Collecting 100 samples in estimated 5.0024 s (7.3M iterations)
Benchmarking generated-noncompressed-4k-idat/8x8.png: Analyzing
generated-noncompressed-4k-idat/8x8.png
                        time:   [687.55 ns 687.70 ns 687.86 ns]
                        thrpt:  [354.93 MiB/s 355.01 MiB/s 355.09 MiB/s]
                 change:
                        time:   [−6.9506% −6.8972% −6.8453%] (p = 0.00 < 0.05)
                        thrpt:  [+7.3483% +7.4082% +7.4698%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
Benchmarking generated-noncompressed-64k-idat/128x128.png
Benchmarking generated-noncompressed-64k-idat/128x128.png: Warming up for 3.0000 s
Benchmarking generated-noncompressed-64k-idat/128x128.png: Collecting 100 samples in estimated 5.0117 s (672k iterations)
Benchmarking generated-noncompressed-64k-idat/128x128.png: Analyzing
generated-noncompressed-64k-idat/128x128.png
                        time:   [7.4723 µs 7.4730 µs 7.4737 µs]
                        thrpt:  [8.1667 GiB/s 8.1674 GiB/s 8.1682 GiB/s]
                 change:
                        time:   [−3.8230% −3.7973% −3.7721%] (p = 0.00 < 0.05)
                        thrpt:  [+3.9199% +3.9472% +3.9750%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe
Benchmarking generated-noncompressed-64k-idat/2048x2048.png
Benchmarking generated-noncompressed-64k-idat/2048x2048.png: Warming up for 3.0000 s
Benchmarking generated-noncompressed-64k-idat/2048x2048.png: Collecting 10 samples in estimated 5.0242 s (3190 iterations)
Benchmarking generated-noncompressed-64k-idat/2048x2048.png: Analyzing
generated-noncompressed-64k-idat/2048x2048.png
                        time:   [1.5710 ms 1.5718 ms 1.5727 ms]
                        thrpt:  [9.9354 GiB/s 9.9410 GiB/s 9.9459 GiB/s]
                 change:
                        time:   [−12.291% −11.690% −11.025%] (p = 0.00 < 0.05)
                        thrpt:  [+12.391% +13.237% +14.013%]
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
Benchmarking generated-noncompressed-2g-idat/2048x2048.png
Benchmarking generated-noncompressed-2g-idat/2048x2048.png: Warming up for 3.0000 s
Benchmarking generated-noncompressed-2g-idat/2048x2048.png: Collecting 10 samples in estimated 5.0518 s (3080 iterations)
Benchmarking generated-noncompressed-2g-idat/2048x2048.png: Analyzing
generated-noncompressed-2g-idat/2048x2048.png
                        time:   [1.6381 ms 1.6393 ms 1.6410 ms]
                        thrpt:  [9.5217 GiB/s 9.5314 GiB/s 9.5382 GiB/s]
                 change:
                        time:   [−2.3397% −1.9448% −1.5494%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5738% +1.9833% +2.3957%]
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
Benchmarking row-by-row/128x128-4k-idat
Benchmarking row-by-row/128x128-4k-idat: Warming up for 3.0000 s
Benchmarking row-by-row/128x128-4k-idat: Collecting 100 samples in estimated 5.0212 s (606k iterations)
Benchmarking row-by-row/128x128-4k-idat: Analyzing
row-by-row/128x128-4k-idat
                        time:   [8.2700 µs 8.2706 µs 8.2712 µs]
                        thrpt:  [7.3792 GiB/s 7.3798 GiB/s 7.3803 GiB/s]
                 change:
                        time:   [−11.145% −11.125% −11.105%] (p = 0.00 < 0.05)
                        thrpt:  [+12.493% +12.518% +12.543%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

@anforowicz anforowicz force-pushed the partial-decode-december-2025 branch from 7742a9a to 0353ae7 Compare January 6, 2026 18:44
@anforowicz
Copy link
Contributor Author

@197g, I think this PR is ready for another round of reviews - can you PTAL?

Functional impact

This PR fixes a functional/behavior regression introduced during 0.18.0-rc series and reported in #639. Chromium can’t update png beyond 0.18.0-rc without a fix.

Fuzzing

IIUC the original fuzzing problems (ones mentioned in #640 (comment)) no longer repro thanks to #662 (h/t @fintelia). GitHub CI nor my local fuzzing runs found any trouble yet.

If you do have a fuzzing-produced testcase that results in test failures, then can you please share it? I see that you reported a result in #664 (comment) - I wonder if this is still reproducible (and if so, I would like to try reproing it and debugging it further).

Performance impact

I still consider tricks discussed in #640 (comment)) to be micro-optimizations and I strongly prefer the more readable enum PrevRow { None, InPlace(usize), Scratch(Vec<u8>) }. I would be very surprised if the difference in benchmark results boiled down to using the enum and comparing enum’s discriminator instead of using the alternative/equivalent conditions based on pre-existing fields (e.g. using self.prev_start = self.current_start to mean PrevRow::None).

I also want to point out that using the current performance as the baseline is unfair. The performance improvements in 303b3e4 have been realized partially thanks to the functional regression this commit unknowingly introduced (avoiding copying may be good for performance, but in this case it broke decoding of partial inputs). Such performance improvements shouldn’t be used as part of the baseline (but I understand that untangling/separating those improvements may not be possible).

FWIW on my machine, the impact to real-world benchmarks seems to be a wash:

  • One real-world benchmarks regressed:
    • decode/Lohengrin_-_Illustrated_Sporting_and_Dramatic_News.png by 2.62% - 3.19%
  • And two real-world benchmarks improved:
    • decode/lorem_ipsum_screenshot.png by 0.49% - 1.55%
    • decode/kodim23.png by 1.72% - 2.42%
Details
  • Improvements:
    • decode/lorem_ipsum_screenshot.png: −1.5568% runtime (P = 0.00)
      • 3 reruns: -0.4907% (within noise threshold, P = 0.02), -0.6695% (within noise threshold, P = 0.01), −1.3159% (P = 0.00)
    • decode/kodim23.png: −2.4215% runtime (P = 0.00)
      • 3 reruns: -1.9932% (P = 0.00), -1.7240% (P = 0.00), -1.9172% (P = 0.00)
  • Regressions:
    • row-by-row/128x128-4k-idat: +21.337% runtime (Found 8 outliers among 100 measurements (8.00%))
      • 3 reruns: +6.7452% (P = 0.00), +5.9710% (P = 0.00), +7.3072% (P = 0.00)
    • generated-noncompressed-4k-idat/8x8.png: +2.7097% runtime (Found 7 outliers among 100 measurements (7.00%))
      • 3 reruns: -2.7809% (improvement! P = 0.00), -2.8512% (improvement! P = 0.00), -1.0155% (within noise threshold, P = 0.00)
    • generated-noncompressed-4k-idat/2048x2048.png: +6.8459% runtime
      • 3 reruns: 6.5962% (improvement! P = 0.00), +0.6646% (within noise threshold, P = 0.75), -2.8955% (P = 0.07 > 0.05)
    • generated-noncompressed-4k-idat/12288x12288.png: +3.1952% runtime
      • 3 reruns: +1.8474% (within noise threshold), +1.9962% (P = 0.00), +2.4707% (P = 0.00)
    • generated-noncompressed-64k-idat/2048x2048.png: +19.350% runtime
      • 3 reruns: +6.4751% (P = 0.00), +5.9616% (P = 0.00), +4.5634% (P = 0.00)
    • decode/Lohengrin_-_Illustrated_Sporting_and_Dramatic_News.png: +3.1989% runtime (P = 0.00)
      • 3 reruns: +2.623%, +2.9944%, +2.9199%
    • decode/tango-icon-address-book-new-16.png: 1.3985% runtime (P = 0.00)
      • During initial run: Found 1 outliers among 10 measurements (10.00%)
      • 3 reruns did not repro the regression consistently: -0.2823% (improvement! P = 0.03), -0.3818% (improvement! P = 0.00), +0.5302% (P = 0.23)
  • No impact (change within the default noise threshold, or P > 0.05, or both):
    • generated-noncompressed-4k-idat/128x128.png
    • generated-noncompressed-64k-idat/128x128.png
    • generated-noncompressed-64k-idat/12288x12288.png
    • generated-noncompressed-2g-idat/2048x2048.png
    • generated-noncompressed-2g-idat/12288x12288.png
    • decode/Transparency.png
    • decode/kodim07.png
    • decode/Fantasy_Digital_Painting.png
    • decode/lorem_ipsum_oxipng.png
    • decode/kodim17.png
    • decode/kodim02.png
    • decode/Exoplanet_Phase_Curve_(Diagram)_indexed_gimp.png
    • decode/Exoplanet_Phase_Curve_(Diagram).png
    • decode/paletted-zune.png
    • decode/tango-icon-address-book-new-32.png
    • decode/tango-icon-address-book-new-128-rsvg-convert.png

I’ve looked at perf diff for decode/Lohengrin_-_Illustrated_Sporting_and_Dramatic_News.png. I picked this testcase, because 1) this is a real-world test and 2) the regression seems least noisy / most reproducible in re-runs. Only minimal delta was shown by perf diff - there wasn’t any particular function that was significantly hotter or colder after this PR. Details below.

Details
# Event 'cpu-clock:u'
#
# Baseline  Delta Abs  Shared Object                  Symbol                                                                       
# ........  .........  .............................  .............................................................................
#
    72.82%     -0.17%  decoder-6d8996fe7125840f       [.] <fdeflate::decompress::Decompressor>::read
     3.00%     +0.11%  decoder-6d8996fe7125840f       [.] crc32fast::specialized::pclmulqdq::calculate
     7.50%     +0.10%  decoder-6d8996fe7125840f       [.] fdeflate::huffman::build_table
               +0.04%  decoder-6d8996fe7125840f       [.] <png::decoder::unfiltering_buffer::UnfilteringBuffer>::unfilter_curr_row_
     0.24%     +0.03%  decoder-6d8996fe7125840f       [.] <png::decoder::Reader<std::io::cursor::Cursor<&[u8]>>>::next_interlaced_r
     1.71%     -0.03%  libc.so.6                      [.] __memset_avx2_unaligned_erms
     9.11%     +0.03%  decoder-6d8996fe7125840f       [.] <alloc::vec::Vec<u8> as alloc::vec::spec_from_iter_nested::SpecFromIterNe
     0.20%     -0.01%  decoder-6d8996fe7125840f       [.] crc32fast::baseline::update_fast_16
     0.02%     +0.01%  ld-linux-x86-64.so.2           [.] do_lookup_x
     0.03%     -0.01%  decoder-6d8996fe7125840f       [.] <crc32fast::Hasher>::update
     0.35%     -0.01%  decoder-6d8996fe7125840f       [.] <alloc::vec::Vec<u8> as alloc::vec::spec_from_iter_nested::SpecFromIterNe
     0.14%     -0.01%  decoder-6d8996fe7125840f       [.] <png::decoder::unfiltering_buffer::UnfilteringBuffer>::with_unfilled_buff
     0.02%     -0.01%  ld-linux-x86-64.so.2           [.] _dl_lookup_symbol_x
     0.03%     -0.01%  decoder-6d8996fe7125840f       [.] <png::decoder::interlace_info::InterlaceInfoIter as core::iter::traits::i
     0.02%     -0.01%  libc.so.6                      [.] _int_malloc
     0.01%     -0.01%  decoder-6d8996fe7125840f       [.] <png::decoder::Reader<std::io::cursor::Cursor<&[u8]>>>::next_frame
…

I also hesitantly looked at the biggest regression in one of the artificial benchmarks. “Hesitantly”, because these benchmarks are quite noisy (as seen in the reruns) and they are artificial - they account at most for ~16% of the runtime of decode/… benchmarks (for details see below). I also got rather inactionable perf diff results:

  • crc32fast::specialized::pclmulqdq::calculate taking +1.39% more than the baseline of 37.31% seems to be mostly noise (because this PR should not affect crc calculations - the only changes are related to processing of the IDAT payload)
  • __memmove_avx_unaligned_erms taking 1.25% less than the baseline of 46.95% also seems hard to explain by changes in this PR (unless we start considering second-order things like code layout - e.g. see Emery Berger's talk).
Details

Let’s consider the performance profile of the real-world decode/… benchmarks gathered with:

$ rm target/release/deps/decoder*
$ rustup run nightly cargo build --bench=decoder --features=unstable --release
$ time taskset --cpu-list 4-7 nice -n -19 perf record -e cpu-clock target/release/deps/decoder-6d8996fe7125840f --bench decode --profile-time=5

Output of perf report:

 50.30%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] <fdeflate::decompress::Decompressor>::read
  10.24%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] png::filter::simd::paeth_unfilter_3bpp
   8.44%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] png::decoder::transform::palette::create_expansion_into_rgb8::{closure#0}
   6.06%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] fdeflate::huffman::build_table
   5.00%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] png::filter::simd::paeth_unfilter_4bpp
   3.95%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] <alloc::vec::Vec<u8> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<u8, core::iter::adapters::flatten::Flatten<core::iter::adapters::take::Take<core::iter::sources::repeat
   3.90%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] png::filter::unfilter
   3.63%  decoder-6d8996f  libc.so.6                 [.] __memmove_avx_unaligned_erms
   1.70%  decoder-6d8996f  libc.so.6                 [.] __memset_avx2_unaligned_erms
   1.56%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] crc32fast::specialized::pclmulqdq::calculate
   0.95%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] <png::decoder::Reader<std::io::cursor::Cursor<&[u8]>>>::next_interlaced_row_impl
   0.84%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] crc32fast::baseline::update_fast_16
   0.71%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] png::filter::paeth::unfilter
   0.28%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] <png::decoder::unfiltering_buffer::UnfilteringBuffer>::with_unfilled_buffer::<<png::decoder::Reader<std::io::cursor::Cursor<&[u8]>>>::next_raw_interlaced_row::{closure#0}, core::res
   0.24%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] <png::decoder::unfiltering_buffer::UnfilteringBuffer>::unfilter_curr_row_in_place
   0.23%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] <png::decoder::stream::StreamingDecoder>::update
   0.22%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] simd_adler32::imp::avx2::imp::update_imp
   0.20%  decoder-6d8996f  decoder-6d8996fe7125840f  [.] <png::decoder::stream::StreamingDecoder>::parse_u32
…

Based on the profile above, code covered by generated-noncompressed-... benchmarks accounts for at most ~16% of the runtime of decode/… benchmarks. This is because generated-noncompressed-... benchmarks use uncompressed, unfiltered, RGBA8 PNGs and therefore do not exercise fdeflate, png::filter, nor png::decoder::transform::palette::create_expansion_into_rgb8. So they do not exercise 56.36% + 19.85% + 8.44% = 84.65% of code.

  • fdeflate and below: 56.36% = 50.30% + 6.06%
  • png::filter and below: 19.85% = 10.24% + 5.00% + 3.90% + 0.71% (disclaimer: processing non-filtered rows still goes through png::filter but quickly devolves into “memcpy”)
  • png::decoder::transform::palette and below: 8.44%

Disclaimers:

  • processing an uncompressed zlib stream still goes through fdeflate but quickly devolves into “memcpy”
  • processing non-filtered rows still goes through png::filter but quickly devolves into “memcpy”

It seems okay to ignore these disclaimers, because

  • __memmove… appears as a separate line the profile (so it didn’t get folded into fdeflate or unfilter entries)
  • The estimate above doesn’t account for all compressed-stream-related entries (e.g. for 0.22% of simd_adler32 impact)
  • The estimate above has been rounded down (from 84.65% to 84%)

If you think there is some extra due diligence that is needed here, then please suggest the next steps. Otherwise, I think the performance results and investigation supports landing this PR.

@anforowicz anforowicz requested a review from 197g January 7, 2026 01:02
@anforowicz
Copy link
Contributor Author

Heads-up / disclaimer: cargo fuzz run has found an issue, but IIUC the issue also repros before this PR - I've opened #666

@fintelia
Copy link
Contributor

fintelia commented Jan 17, 2026

On the QOI bench corpus measured using corpus-bench, I see an approximately 1% slowdown. Even though Chromium is blocked on this PR, I'm not sure that's a tolerable cost.

(I haven't had a chance to fully review the code, so this is just based on the end-to-end performance)

The refactoring in this commit is desirable to support shifting by a
different `discard_size` (i.e. making that independent from
`self.prev_start`).

This commit also opportunistically adds an `assert!` that ensures that
shifting left won't accidentally clobber the immutable "lookback"
window.
The refactoring in this commit is desirable because:

* Main reason: To support calling the extracted function from another
  place in a follow-up commit.
* Secondary reason: To improve readability a little bit, by making
  `fn unfilter_curr_row` slightly less noisy / more focused on its core
  functionality, which is:
    - extracting `prev_row`, and `row`
    - calling `unfilter`
    - updating `current_start` and `prev_start`
This commit falls back to unfiltering the current row out-of-place if an
`UnexpectedEof` error is encountered when in-place bytes cannot yet be
mutated.  This fixes image-rs#639.
@anforowicz anforowicz force-pushed the partial-decode-december-2025 branch from 0353ae7 to 4fcdc21 Compare January 26, 2026 20:51
@anforowicz anforowicz requested a review from telecos January 26, 2026 20:58
@anforowicz
Copy link
Contributor Author

On the QOI bench corpus measured using corpus-bench, I see an approximately 1% slowdown. Even though Chromium is blocked on this PR, I'm not sure that's a tolerable cost.

The functional regression was introduced in #590 which (based on Ryzen results mentioned in #590 (comment)) resulted in an average performance improvement of -2.94% ((1.1781 - 14.535 - 1.1326 - 1.3190 - 0.7238 - 2.0269 - 2.0258) / 7). IMO we should accept losing some of that performance improvement, because it stems from a PR that introduced a functional regression. (In theory, we could get a big performance improvement if we always decoded to all-black-pixels regardless of input, but this would be an obvious functional regression. The functional regression fixed by this PR is more subtle and before this PR lacked automated regression tests, but I think the same high-level reasoning should apply.)

Copy link
Member

@197g 197g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. We can see to claw it back with the functionality in place. I have a couple of ideas floating in my head for performance anyways for unfiltering multiple rows at the same time that should go on a well-tested foundation rather than complicate the correctness we want further.

@197g 197g merged commit c648c31 into image-rs:master Feb 14, 2026
45 of 46 checks passed
@Shnatsel
Copy link
Member

We should cut a new release to make this change easier for Chromium to pick up. I'll run a regression test on my corpora and then open a release PR.

@fintelia
Copy link
Contributor

I should probably add that I'm pretty angry about this fiasco and don't plan on collaborating/helping the Chromium folks going forward.

@197g
Copy link
Member

197g commented Feb 15, 2026

Sigh. Not a week in FOSS without license drama. I'm also disappointed to find out the license wasn't checked during the trial period. Quitting collaboration on the entire staff though, that seems a disproportionate response though to what a regex with the right intent. I would appreciate an explanation on how they intend to verify licenses better to avoid repeat occurrence of similar issues going forward.

@fintelia
Copy link
Contributor

It isn't just the regex. It is also the lack of acknowledgement/apology. It is choosing not to fast-track the correction into last week's release. It is reaching out privately asking when PRs might land. It is seeing Google's treatment of ffmpeg and libxml2. And it is taking a step back and wondering why I'm providing free labor for the benefit of multiple multi-trillion dollar corporations.

All that said, I don't have any desire to impose my feelings on the other community members here. If others want to continue collaborating, they should feel free to do so. It really is awesome to see that Chromium has a memory-safe PNG decoder

@anforowicz
Copy link
Contributor Author

I would appreciate an explanation on how they intend to verify licenses better to avoid repeat occurrence of similar issues going forward.

The test added in https://crrev.com/c/7514149/4/components/resources/about_ui_credits_unittests.cc added verification of the contents of the UI text that is displayed in chrome://credits.

It is also the lack of acknowledgement/apology.

I apologize for the lack of mention of png and other Rust crates in Chromium's chrome://credits page. I don't have a good explanation for why this happened - I think we just didn't think about testing this particular aspect of adding Rust toolchain and crates to Chromium. We assumed that the info in README.chromium files flows into all the right places and didn't think what specific places we should manually verify.

It is choosing not to fast-track the correction into last week's release.

Chromium ships a new milestone to the Stable channel around every 4 weeks. Stable respins are avoided and typically reserved for security issues - this is done to minimize the impact to end users (download bandwidth, having to restart the browser) and this way ensuring that updates that get shipped to the Stable channel get downloaded and applied quickly.

That said, this is not a valid excuse for not considering a merge to the M145 Beta channel. It was fairly late in the M145 release cycle, but a merge was probably still possible at the end of January. I'll try to do better going forward.

@anforowicz anforowicz deleted the partial-decode-december-2025 branch February 20, 2026 21:05
@anforowicz
Copy link
Contributor Author

We should cut a new release to make this change easier for Chromium to pick up. I'll run a regression test on my corpora and then open a release PR.

Thank you for publishing the new version - it has now been imported into Chromium (starting with version 147.0.7701.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected change of behavior when reading partial data

5 participants