Skip to content

Commit 8956cbf

Browse files
authored
Merge pull request #380 from mhvk/cycle4-funding
Cycle 4 funding: uncertainties, masks, quantities, other types of arrays
2 parents c2452dd + 3e40ef7 commit 8956cbf

File tree

1 file changed

+103
-0
lines changed

1 file changed

+103
-0
lines changed
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
### Title
2+
3+
Support for Uncertainties, Masks, Improved Quantities, Other Types of Arrays
4+
5+
### Project Team
6+
7+
Marten van Kerkwijk
8+
9+
### Project Description
10+
11+
I request partial buy-out from my professorship at UofT to be able to work one
12+
day a week on projects that are too large for the time I can currently commit
13+
for astropy. Specifically, I propose,
14+
15+
- To ensure Masked and Distribution can be used in all the main astropy classes
16+
(Time, Representation, Frame, and SkyCoord).
17+
- Use the new numpy dtype machinery to deal with units, thus speeding up units
18+
conversions.
19+
- Facilitate Quantity becoming a container class that can handle not just
20+
ndarray but any type of array, i.e., also dask, jax, etc.
21+
- Extend the same machinery to Masked and Distribution so that all main astropy
22+
classes can use arbitrary array classes.
23+
- Finish my implementation of a Variable class that tracks uncertainties and
24+
their correlations analytically (based on the uncertainties package).
25+
26+
### Project / Work
27+
28+
Currently, I spent about a day per week on astropy core, in reviews, bug
29+
fixes, and development. While I have managed to use extra time for fairly
30+
large developments (Quantity historically and Masked and Uncertainty more
31+
recently, with also fairly major contributions to Time, Table, Representation
32+
and numpy), it has been difficult to find enough time to actually wrap up
33+
larger projects (at least outside sabbaticals).
34+
35+
In particular, while I found time to enable the use of Masked with Quantity
36+
and Time, the logical extension to coordinates is still missing. Similarly,
37+
Distribution now works well with Quantity, but not with Time and the various
38+
classes underlying coordinates. Solving this will allow masks and (implicit)
39+
error propagation on all astropy core classes. Furthermore, an
40+
[PR](https://github.com/astropy/astropy/pull/3715) to introduce a Variable
41+
class that tracks uncertainties and covariances (based on the [uncertainties
42+
package](https://pythonhosted.org/uncertainties/), but extended it to deal
43+
natively with arrays), has been stalled for almost a decade. All these
44+
require focussed time to finish the implementation, writing tests, documenting
45+
proper usage, etc.
46+
47+
An exciting development at numpy has been the new dtype machinery, which
48+
allows much easier design of user data types. So far, the main application
49+
has been a new StringDType, which allows having an array of variable-length
50+
unicode strings (I have been a major reviewer of this, partially to get
51+
familiar with the new machinery; it may be useful for astropy too).
52+
53+
One of the explicit use cases for the dtype redesign was to support units
54+
(which can be seen as descriptions of how to interpret the data, with
55+
converting to another unit similar to casting to another data type). This
56+
seems one of the easiest ways to reduce the intricate dependencies on ndarray
57+
and remove many of the overrides of its methods. Our unit system itself is
58+
nicely separated out, which should facilitate using it for a new unit-carrying
59+
dtype. Benefits of using this include speed-up (as much more will be done in
60+
C), and removal of quite tricky code to deal with, e.g., structured data
61+
types. I also have some hope of separating out the units/quantity code from
62+
astropy, so that it can be used more generally.
63+
64+
Using unit-carrying data types should also make it easier (though is not
65+
required) for Quantity to support other array classes (dask, jax, etc.), as
66+
proposed in [APE 25](https://github.com/astropy/astropy-APEs/pull/91). This
67+
will help deal with larger data sets (dask) or use GPU acceleration(jax).
68+
The [APE 25 report](https://github.com/nstarman/astropy-APEs/blob/units-quantity-2.0/APE25/report.pdf)
69+
lays out in detail how this could work. My proposal here is to implement it,
70+
write proper tests, ensure there are no performance regressions, and of course
71+
document it all. A nice benefit of the approach laid out in APE 25 is that it
72+
will be very easy to extend it to Masked and Distribution (and possibly
73+
Variable), as those basically are already the type of container classes that
74+
APE 25 envisions.
75+
76+
A nice things of Quantity being able to use other array types than ndarray is
77+
that this will nearly automatically extend to coordinates (since those use
78+
quantities almost exclusively; I foresee little more work than adjusting
79+
tests!). Time will be slightly more work, as it works directly with ndarray,
80+
but also here the path is straightforward: I can just follow my earlier work
81+
on ensuring Time can work with Masked.
82+
83+
I should perhaps add that the different projects can be separated relatively
84+
easily, and do not have a very obvious order. Hence, I can give priority to
85+
whatever is deemed most important.
86+
87+
### Approximate Budget
88+
89+
I request funding to replace salary equivalent to one day a week, reducing my
90+
regular employment at the University of Toronto correspondingly. At a
91+
standard rate of USD 150/hour (which happens to be roughly my current salary)
92+
for 8 hours per week and 45 weeks, this corresponds to USD $54000 per year.
93+
94+
Note: I have confirmed with my Chair that a reduction, including of teaching,
95+
is possible in principle, but am still in the process of finding out how this
96+
would work in practice.
97+
98+
### Period of Performance
99+
100+
Ideally, this would be for three years (if funding allows and my university
101+
agrees), but the projects are sufficiently separable that a shorter term is
102+
useful too (but not fewer hours per week, since the goal is to have full days
103+
for astropy development only).

0 commit comments

Comments
 (0)