Skip to content

Commit f42a01f

Browse files
committed
Phase 2
1 parent a96834b commit f42a01f

File tree

1 file changed

+29
-5
lines changed

1 file changed

+29
-5
lines changed

finance/proposal-calls/cycle3/aperio_fits_dask.md

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,34 @@ Some relevant issues:
6969

7070
While this is one of the largest performance issues in `io.fits`, this project
7171
is also proposing to increase the integration of Dask with `io.fits` which will
72-
enable significant performance improvements when using FITS files with
73-
distributed compute.
72+
enable significant performance improvements when using FITS files with dask.
73+
74+
This proposal is focusing on images in FITS files, so both uncompressed images
75+
and compressed images (which are stored in binary tables underneath), and
76+
proposes to add an option to `io.fits` to read both these types of FITS arrays
77+
directly into Dask arrays.
78+
79+
While the proposal team has significant experience with reading various data
80+
formats into Dask arrays, for example FITS images and CASA images and tables.
81+
A proportion of the development time for this section of the proposal will be
82+
devoted to researching the most effective method of loading FITS files into Dask
83+
arrays.
84+
85+
Currently it is
86+
[possible](https://github.com/sunpy/sunpy/issues/2715#issuecomment-413286821) to
87+
load an image into a Dask array, via the "delayed" functionality in Dask. In
88+
this case, the file is opened when reading a chunk of data from the array, and
89+
then closed again afterwards.
90+
This approach works well for a lot of use cases, but is complex, it would be a
91+
lot better if this were integrated into `io.fits` directly.
92+
93+
For compressed images, Dask integration would allow you to process the
94+
compressed chunks of the image in parallel (either on a single machine or
95+
distributed), as if each compressed tile of the image was a dask chunk then it
96+
can be parallelised over using the various dask schedulers.
97+
98+
This proposal is to get an initial implementation of both of these read use
99+
cases integrated into `io.fits`, and in the process to document any future
100+
improvements that could be made to enhance performance.
74101

75102
### Approximate Budget
76-
77-
Money in return for the sacrifice of sanity required to rework large chunks of
78-
`io.fits`. /s

0 commit comments

Comments
 (0)