Skip to content

Commit a96834b

Browse files
committed
More words
1 parent 99ec98b commit a96834b

File tree

1 file changed

+46
-8
lines changed

1 file changed

+46
-8
lines changed

finance/proposal-calls/cycle3/aperio_fits_dask.md

Lines changed: 46 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,51 @@ Drew Leonard
1111
This project aims to increase performance of `io.fits` especially around large
1212
compressed image data.
1313

14+
#### Compressed Image Performance
15+
1416
Currently when using tiled FITS compression the Astropy implementation loads
1517
all the tiles into RAM and decompresses them all.
1618
This is highly CPU and memory inefficient, and is one of the reasons Astropy is
1719
significantly slower at loading these types of files than the `cfitsio` package
1820
(which uses the same C library as Astropy for this currently).
1921

20-
While this is one of the largest performance issues in `io.fits`, this project
21-
is also proposing to increase the integration of Dask with `io.fits` which will
22-
enable significant performance improvements when using FITS files with
23-
distributed compute.
22+
For example we can compare the loading times of `io.fits` vs `cfitsio`:
23+
24+
Loading a whole array with astropy:
25+
26+
```
27+
In [44]: %timeit fits.getdata("VBI_L1_00656282_2018_05_11T14_25_05_466665_I.fits", hdu=1)
28+
183 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
29+
```
30+
31+
with cfitsio:
32+
```
33+
In [45]: %timeit hdu[1][:, :]
34+
131 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
35+
```
36+
37+
loading one or more individual tiles with `cfitsio`:
38+
```
39+
In [48]: %timeit hdu[1][0, 0]
40+
21.6 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
41+
42+
In [49]: %timeit hdu[1][0, :]
43+
22.3 µs ± 78.2 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
2444
25-
It's likely that any significant refactor of `CompImageHDU` will lead to an
26-
implementation not using the `fitsio` C library see
27-
[#3895](https://github.com/astropy/astropy/issues/3895) which also has the side
28-
effect of significantly reducing the compile time complexity of Astropy, as
45+
In [50]: %timeit hdu[1][:10, :]
46+
235 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
47+
```
48+
49+
note that doing the same operations with astropy currently loads the whole file
50+
so takes the same amount of time as the first `getdata` call. This means that
51+
loading a single tile with astropy is approximately 8300x slower than with
52+
`cfitsio`!!
53+
54+
Issue [#3895](https://github.com/astropy/astropy/issues/3895) has been open
55+
since 2015 and describes a set of deep improvements for the compressed image
56+
support in `io.fits`. This includes implementing the compression natively in
57+
Python rather than using the `fitsio` C library to do it. This will have the
58+
side effect of significantly reducing the compile time complexity of Astropy, as
2959
this is the only part of the code which uses `fitsio`.
3060

3161
Some relevant issues:
@@ -34,6 +64,14 @@ Some relevant issues:
3464
* https://github.com/astropy/astropy/issues/11633
3565
* https://github.com/astropy/astropy/issues/9238
3666

67+
68+
#### Dask Integration with reading FITS files
69+
70+
While this is one of the largest performance issues in `io.fits`, this project
71+
is also proposing to increase the integration of Dask with `io.fits` which will
72+
enable significant performance improvements when using FITS files with
73+
distributed compute.
74+
3775
### Approximate Budget
3876

3977
Money in return for the sacrifice of sanity required to rework large chunks of

0 commit comments

Comments
 (0)