-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[imageio] Radically speed up PFM loading #17689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[imageio] Radically speed up PFM loading #17689
Conversation
...we can do this by correcting the addressing in the resulting buffer filling loops so that the image rows end up in the right place from the beginning
TurboGit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me, thanks!
|
@TurboGit
If it is better to add technical details, then:
|
|
A bit of details is fine by me. Thanks! BTW, do you get notification about #16334, just wondering about the status of this. |
Yep got it, will get to commenting tonight. Your proposal is very interesting, but it makes the code even more complicated when it is not clear how long we will still need libraw. Actually, I wanted to understand the status of rawspeed and what is the reason that rawspeed is in such a state that it cannot completely replace libraw. But this is a big discussion topic that I didn't have the energy to start at the time. |
I don't see libraw removed at any point in the future. |
The original code used a clever trick to allow format conversion (3float -> 4float) in-place (it went sequentially(!) from the end of the data buffer to the beginning thus ensuring that the data would not be overwritten). Unfortunately, this made the code OpenMP-incompatible. The use of two buffers (input data and output for darktable) made it possible to parallelize the format conversion loops.
Also, the code contained a separate loop for reordering image rows (bottom-top to top-bottom) using a temporary row buffer. In effect, this meant copying the entire array of image data twice more. For large images it took quite a significant amount of time.
We managed to get rid of these rows reordering by copying the rows to the correct address directly in the format conversion loop.
As a result, for example, a PFM file converted from a 45 megapixel Nikon Z7 camera image on a computer with an AMD Ryzen 7 5800HS CPU was loaded with the original code in 0.45 seconds, after changes in this PR in 0.09 seconds (fivefold speed up!).