Allow full unfiltering for partial data (enum-based approach)#664
Allow full unfiltering for partial data (enum-based approach)#664197g merged 3 commits intoimage-rs:masterfrom
enum-based approach)#664Conversation
|
There are some CIFuzz failures, but they seem to be caused by some infrastructure problems ( |
|
Did you use the corpus from oss-fuzz? For what it's worth the fuzz failure on the other PR is also somewhat spurious, it's a highly specific condition and fuzzing locally never found the same fault for me either (even after 64-threads for a day, it's just peanuts). |
I've just run Would you have instructions for using oss-fuzz? (We may want to stash them into
Without changing |
Sorry, I've always relied on the API job for this and hoped you'd be closer to the source 😄 . For some concrete failures there are the failures on the mailing list but I would like to know how to retrieve the full corpus myself. |
|
Update: the fuzzer results are now running and look on first site like a genuine detection. The first assert I see for that method is assert!(output_position <= output.len()); |
197g
left a comment
There was a problem hiding this comment.
Maybe the fuzzer finding of panic in zlib is fixed by a rebase on #662 although I don't understand how it is getting triggered in the first place. It'd be ironic if the enum variants introduced for following the state actually lead to more possible state corruption.
| fn as_slice<'a>(&'a self, buf: &'a [u8]) -> &'a [u8] { | ||
| match self { | ||
| PrevRow::None => &[], | ||
| PrevRow::InPlace(prev_start) => &buf[*prev_start..], | ||
| PrevRow::Scratch(buf) => buf.as_slice(), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
micro optimization is the root of all evil
"Yet we should not pass up our opportunities in that critical 3%", to complete the quote. There's a bit of cost I'm can accept to accommodate best-efforts on broken images and incremental encoding but given that this isn't the use case for a lot of our other users, not too much. This shouldn't even really appear in the executed code path for any of our benchmarks that supply the full images in memory.
I'm also sorry to say it is far from micro on non-compressed streams in so far as I can tell. Granted the majority is rather neutral but this is all end-to-end and the zlib decompression masks a lot of other costs when it is involved.
Details
Benchmarking decode/tango-icon-address-book-new-16.png
Benchmarking decode/tango-icon-address-book-new-16.png: Warming up for 3.0000 s
Benchmarking decode/tango-icon-address-book-new-16.png: Collecting 10 samples in estimated 5.0001 s (1.1M iterations)
Benchmarking decode/tango-icon-address-book-new-16.png: Analyzing
decode/tango-icon-address-book-new-16.png
time: [4.5412 µs 4.5419 µs 4.5426 µs]
thrpt: [214.98 MiB/s 215.01 MiB/s 215.04 MiB/s]
change:
time: [−1.6796% −1.6476% −1.6182%] (p = 0.00 < 0.05)
thrpt: [+1.6449% +1.6752% +1.7083%]
Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) low mild
Benchmarking decode/tango-icon-address-book-new-32.png
Benchmarking decode/tango-icon-address-book-new-32.png: Warming up for 3.0000 s
Benchmarking decode/tango-icon-address-book-new-32.png: Collecting 10 samples in estimated 5.0002 s (652k iterations)
Benchmarking decode/tango-icon-address-book-new-32.png: Analyzing
decode/tango-icon-address-book-new-32.png
time: [7.6328 µs 7.6330 µs 7.6333 µs]
thrpt: [511.74 MiB/s 511.76 MiB/s 511.77 MiB/s]
change:
time: [−3.8710% −3.8472% −3.8219%] (p = 0.00 < 0.05)
thrpt: [+3.9738% +4.0011% +4.0269%]
Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
Benchmarking generated-noncompressed-4k-idat/8x8.png
Benchmarking generated-noncompressed-4k-idat/8x8.png: Warming up for 3.0000 s
Benchmarking generated-noncompressed-4k-idat/8x8.png: Collecting 100 samples in estimated 5.0024 s (7.3M iterations)
Benchmarking generated-noncompressed-4k-idat/8x8.png: Analyzing
generated-noncompressed-4k-idat/8x8.png
time: [687.55 ns 687.70 ns 687.86 ns]
thrpt: [354.93 MiB/s 355.01 MiB/s 355.09 MiB/s]
change:
time: [−6.9506% −6.8972% −6.8453%] (p = 0.00 < 0.05)
thrpt: [+7.3483% +7.4082% +7.4698%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
Benchmarking generated-noncompressed-64k-idat/128x128.png
Benchmarking generated-noncompressed-64k-idat/128x128.png: Warming up for 3.0000 s
Benchmarking generated-noncompressed-64k-idat/128x128.png: Collecting 100 samples in estimated 5.0117 s (672k iterations)
Benchmarking generated-noncompressed-64k-idat/128x128.png: Analyzing
generated-noncompressed-64k-idat/128x128.png
time: [7.4723 µs 7.4730 µs 7.4737 µs]
thrpt: [8.1667 GiB/s 8.1674 GiB/s 8.1682 GiB/s]
change:
time: [−3.8230% −3.7973% −3.7721%] (p = 0.00 < 0.05)
thrpt: [+3.9199% +3.9472% +3.9750%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
3 (3.00%) high severe
Benchmarking generated-noncompressed-64k-idat/2048x2048.png
Benchmarking generated-noncompressed-64k-idat/2048x2048.png: Warming up for 3.0000 s
Benchmarking generated-noncompressed-64k-idat/2048x2048.png: Collecting 10 samples in estimated 5.0242 s (3190 iterations)
Benchmarking generated-noncompressed-64k-idat/2048x2048.png: Analyzing
generated-noncompressed-64k-idat/2048x2048.png
time: [1.5710 ms 1.5718 ms 1.5727 ms]
thrpt: [9.9354 GiB/s 9.9410 GiB/s 9.9459 GiB/s]
change:
time: [−12.291% −11.690% −11.025%] (p = 0.00 < 0.05)
thrpt: [+12.391% +13.237% +14.013%]
Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high severe
Benchmarking generated-noncompressed-2g-idat/2048x2048.png
Benchmarking generated-noncompressed-2g-idat/2048x2048.png: Warming up for 3.0000 s
Benchmarking generated-noncompressed-2g-idat/2048x2048.png: Collecting 10 samples in estimated 5.0518 s (3080 iterations)
Benchmarking generated-noncompressed-2g-idat/2048x2048.png: Analyzing
generated-noncompressed-2g-idat/2048x2048.png
time: [1.6381 ms 1.6393 ms 1.6410 ms]
thrpt: [9.5217 GiB/s 9.5314 GiB/s 9.5382 GiB/s]
change:
time: [−2.3397% −1.9448% −1.5494%] (p = 0.00 < 0.05)
thrpt: [+1.5738% +1.9833% +2.3957%]
Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) high mild
1 (10.00%) high severe
Benchmarking row-by-row/128x128-4k-idat
Benchmarking row-by-row/128x128-4k-idat: Warming up for 3.0000 s
Benchmarking row-by-row/128x128-4k-idat: Collecting 100 samples in estimated 5.0212 s (606k iterations)
Benchmarking row-by-row/128x128-4k-idat: Analyzing
row-by-row/128x128-4k-idat
time: [8.2700 µs 8.2706 µs 8.2712 µs]
thrpt: [7.3792 GiB/s 7.3798 GiB/s 7.3803 GiB/s]
change:
time: [−11.145% −11.125% −11.105%] (p = 0.00 < 0.05)
thrpt: [+12.493% +12.518% +12.543%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
7742a9a to
0353ae7
Compare
|
@197g, I think this PR is ready for another round of reviews - can you PTAL? Functional impactThis PR fixes a functional/behavior regression introduced during 0.18.0-rc series and reported in #639. Chromium can’t update FuzzingIIUC the original fuzzing problems (ones mentioned in #640 (comment)) no longer repro thanks to #662 (h/t @fintelia). GitHub CI nor my local fuzzing runs found any trouble yet. If you do have a fuzzing-produced testcase that results in test failures, then can you please share it? I see that you reported a result in #664 (comment) - I wonder if this is still reproducible (and if so, I would like to try reproing it and debugging it further). Performance impactI still consider tricks discussed in #640 (comment)) to be micro-optimizations and I strongly prefer the more readable I also want to point out that using the current performance as the baseline is unfair. The performance improvements in 303b3e4 have been realized partially thanks to the functional regression this commit unknowingly introduced (avoiding copying may be good for performance, but in this case it broke decoding of partial inputs). Such performance improvements shouldn’t be used as part of the baseline (but I understand that untangling/separating those improvements may not be possible). FWIW on my machine, the impact to real-world benchmarks seems to be a wash:
Details
I’ve looked at DetailsI also hesitantly looked at the biggest regression in one of the artificial benchmarks. “Hesitantly”, because these benchmarks are quite noisy (as seen in the reruns) and they are artificial - they account at most for ~16% of the runtime of
DetailsLet’s consider the performance profile of the real-world Output of Based on the profile above, code covered by
Disclaimers:
It seems okay to ignore these disclaimers, because
If you think there is some extra due diligence that is needed here, then please suggest the next steps. Otherwise, I think the performance results and investigation supports landing this PR. |
|
Heads-up / disclaimer: |
|
On the QOI bench corpus measured using (I haven't had a chance to fully review the code, so this is just based on the end-to-end performance) |
The refactoring in this commit is desirable to support shifting by a different `discard_size` (i.e. making that independent from `self.prev_start`). This commit also opportunistically adds an `assert!` that ensures that shifting left won't accidentally clobber the immutable "lookback" window.
The refactoring in this commit is desirable because:
* Main reason: To support calling the extracted function from another
place in a follow-up commit.
* Secondary reason: To improve readability a little bit, by making
`fn unfilter_curr_row` slightly less noisy / more focused on its core
functionality, which is:
- extracting `prev_row`, and `row`
- calling `unfilter`
- updating `current_start` and `prev_start`
This commit falls back to unfiltering the current row out-of-place if an `UnexpectedEof` error is encountered when in-place bytes cannot yet be mutated. This fixes image-rs#639.
0353ae7 to
4fcdc21
Compare
The functional regression was introduced in #590 which (based on Ryzen results mentioned in #590 (comment)) resulted in an average performance improvement of -2.94% ((1.1781 - 14.535 - 1.1326 - 1.3190 - 0.7238 - 2.0269 - 2.0258) / 7). IMO we should accept losing some of that performance improvement, because it stems from a PR that introduced a functional regression. (In theory, we could get a big performance improvement if we always decoded to all-black-pixels regardless of input, but this would be an obvious functional regression. The functional regression fixed by this PR is more subtle and before this PR lacked automated regression tests, but I think the same high-level reasoning should apply.) |
197g
left a comment
There was a problem hiding this comment.
Alright. We can see to claw it back with the functionality in place. I have a couple of ideas floating in my head for performance anyways for unfiltering multiple rows at the same time that should go on a well-tested foundation rather than complicate the correctness we want further.
|
We should cut a new release to make this change easier for Chromium to pick up. I'll run a regression test on my corpora and then open a release PR. |
|
I should probably add that I'm pretty angry about this fiasco and don't plan on collaborating/helping the Chromium folks going forward. |
|
Sigh. Not a week in FOSS without license drama. I'm also disappointed to find out the license wasn't checked during the trial period. Quitting collaboration on the entire staff though, that seems a disproportionate response though to what a regex with the right intent. I would appreciate an explanation on how they intend to verify licenses better to avoid repeat occurrence of similar issues going forward. |
|
It isn't just the regex. It is also the lack of acknowledgement/apology. It is choosing not to fast-track the correction into last week's release. It is reaching out privately asking when PRs might land. It is seeing Google's treatment of ffmpeg and libxml2. And it is taking a step back and wondering why I'm providing free labor for the benefit of multiple multi-trillion dollar corporations. All that said, I don't have any desire to impose my feelings on the other community members here. If others want to continue collaborating, they should feel free to do so. It really is awesome to see that Chromium has a memory-safe PNG decoder |
The test added in https://crrev.com/c/7514149/4/components/resources/about_ui_credits_unittests.cc added verification of the contents of the UI text that is displayed in chrome://credits.
I apologize for the lack of mention of
Chromium ships a new milestone to the Stable channel around every 4 weeks. Stable respins are avoided and typically reserved for security issues - this is done to minimize the impact to end users (download bandwidth, having to restart the browser) and this way ensuring that updates that get shipped to the Stable channel get downloaded and applied quickly. That said, this is not a valid excuse for not considering a merge to the M145 Beta channel. It was fairly late in the M145 release cycle, but a merge was probably still possible at the end of January. I'll try to do better going forward. |
Thank you for publishing the new version - it has now been imported into Chromium (starting with version 147.0.7701.0). |
Hello!
This PR fixes #639. This is an alternative approach to @197g's #640 - the two main differences are:
prev_start: usizeis replaced withprev_row: PrevRowwherePrevRowis anenumwith separateNone,InPlace, andScratchvariants. I think this results in easier to read code then imposing a special meaning on certain values of offsets (e.g.self.prev_start == self.curr_start=> no previous row; or havingself.prev_startsometimes trailself.curr_startbyrowlen-1and sometimes point to mutable scratch space in the same buffer ascurr_start).PTAL?