-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Issue
First of all, thank you so much for your work on bioio!
We are experiencing significant performance differences in image reading between bioio and the respective readers wrapped by bioio and we would like to know if there is a potential to improve this.
This issue is related to #130.
Background
We want to use bioio for pixel-patrol, a tool for assessing image quality and consistency within and between different image collections. Bioio seems to be the perfect fit for unifying metadata readouts across formats. Therefore, we load each image of a folder using bioio and write statistics and metadata for each file into one big table.
How to reproduce
Code for benchmarking and profiling https://gist.github.com/frauzufall/a4c5b82cafc1c9707c2c8ffd07dd1107 or run it via uv directly:
uv run https://gist.githubusercontent.com/frauzufall/a4c5b82cafc1c9707c2c8ffd07dd1107/raw/a45513e210111624174b251477c69c0ae8830ea8/benchmark_bioio_vs_native.py
Here are some statistics:
PNG:
======================================================================
PNG Loading Speed Comparison Report
======================================================================
Number of runs per file: 50
File Name | Size (MB) | imageio (s) | bioio (s) | % Higher
--------------------------------------------------------------------------------
test_image_1000x1000.png | 0.01 | 0.005756 | 0.031467 | 446.67 %
test_image_100x100.png | 0.00 | 0.000239 | 0.006888 | 2778.68 %
test_image_2000x2000.png | 0.02 | 0.037071 | 0.128597 | 246.90 %
test_image_4000x4000.png | 0.07 | 0.152327 | 0.553192 | 263.16 %
test_image_500x500.png | 0.00 | 0.001510 | 0.012852 | 751.21 %
test_image_8000x8000.png | 0.25 | 0.644278 | 2.054795 | 218.93 %
======================================================================
Overall Summary:
----------------------------------------------------------------------
Total average loading time across all PNG images (Imageio): 0.841181 s
Total average loading time across all PNG images (BioIO): 2.787791 s
Conclusion: BioIO (PNG) is slower than Imageio (PNG) by approximately 231.41% (Total difference: 1.946610 s).
TIFF:
======================================================================
TIFF Loading Speed Comparison Report
======================================================================
Number of runs per file: 50
File Name | Size (MB) | tifffile (s) | bioio (s) | % Higher
--------------------------------------------------------------------------------
test_image_1000x1000.tiff | 0.95 | 0.000207 | 0.003340 | 1515.81 %
test_image_100x100.tiff | 0.01 | 0.000150 | 0.002234 | 1386.47 %
test_image_2000x2000.tiff | 3.81 | 0.000362 | 0.007007 | 1833.80 %
test_image_4000x4000.tiff | 15.26 | 0.001776 | 0.028777 | 1520.20 %
test_image_500x500.tiff | 0.24 | 0.000153 | 0.003179 | 1980.20 %
test_image_8000x8000.tiff | 61.04 | 0.014028 | 0.104423 | 644.39 %
======================================================================
Overall Summary:
----------------------------------------------------------------------
Total average loading time across all TIFF images (Tifffile): 0.016676 s
Total average loading time across all TIFF images (BioIO): 0.148960 s
Conclusion: BioIO (TIFF) is slower than Tifffile (TIFF) by approximately 793.24% (Total difference: 0.132283 s).
And some screenshots from profiling (first bioio, then the native reader).
Bonus screenshot from TIFF using bioio, but with many small files (the plugin discovery comes up here more significantly):

Wild guesses
Without having any knowledge about the bioio implementation, it looks like the same read method is called twice? And for TIFF, tokenizing seems expensive and is also called repeatedly, is it required?
These seem to be issues related to delayed vs direct array loading. ChatGPT is of the opinion that this line is problematic, but I don't feel competent enough to judge its significance or implications.
Also, the plugin discovery mechanism is quite costly if one uses bioio in a loop on many images, can one cache this somehow?
We wonder if there are ways to improve the performance of bioio and, if needed, are also happy to contribute to efforts in that direction.
Best,
Deborah



