Skip to content

Commit 38f8e2e

Browse files
committed
Cycle 4 funding: uncertainties, masks, quantities, other types of arrays
1 parent 5cb6eed commit 38f8e2e

File tree

1 file changed

+91
-0
lines changed

1 file changed

+91
-0
lines changed
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
### Title
2+
3+
Support for Uncertainties, Masks, Improved Quantities, Other Types of Arrays
4+
5+
### Project Team
6+
7+
Marten van Kerkwijk
8+
9+
### Project Description
10+
11+
I request partial buy-out from my professorship at UofT to be able to work one
12+
day a week on projects that are too large for the time I can currently commit
13+
for astropy. Specifically, I propose,
14+
15+
- To enhance Masked and Distribution such that they can be used in all the
16+
main astropy classes (Time, Representation, Frame, and SkyCoord).
17+
- Use the new numpy dtype machinery to deal with units, thus speeding up units
18+
conversions and facilitating Quantity becoming a container class that can
19+
handle not just ndarray but any type of array, i.e., also dask, jax, etc.
20+
- Extend the same machinery to Masked and Distribution so that all main astropy
21+
classes can use arbitrary array classes.
22+
- Introduce a new Variable class that tracks uncertainties and their correlations
23+
analytically (based on ideas from the uncertainties package).
24+
25+
### Project / Work
26+
27+
Currently, I spent about a day per week on astropy core, in reviews, bug
28+
fixes, and development. While I have managed to use extra time for fairly
29+
large developments (Quantity historically and Masked and Uncertainty more
30+
recently, with also fairly major contributions to Time, Table, Representation
31+
and numpy), it has been difficult to find enough time to actually wrap up
32+
larger projects (at least outside sabbaticals).
33+
34+
In particular, while I found time to enable the use of Masked with Quantity
35+
and Time, the logical extension to coordinates is still missing. Similarly,
36+
Distribution now works well with Quantity, but not with Time and the various
37+
classes underlying coordinates. Solving this will allow masks and (implicit)
38+
error propagation on all astropy core classes. Furthermore, an attempt to
39+
introduce a Variable class that tracks uncertainties and covariances has been
40+
stalled for almost a decade. All these require focussed time.
41+
42+
An exciting development at numpy has been the new dtype machinery, which
43+
allows much easier design of user data types. So far, the main application
44+
has been a new StringDType, which allows having an array of variable-length
45+
unicode strings (I have been a major reviewer of this, partially to get
46+
familiar with the new machinery; it may be useful for astropy too).
47+
48+
One of the explicit use cases for the dtype redesign was to support units
49+
(which can be seen as descriptions of how to interpret the data, with
50+
converting to another unit similar to casting to another data type). This
51+
seems one of the easiest ways to reduce the intricate dependencies on ndarray
52+
and remove many of the overrides of its methods. Our unit system itself is
53+
nicely separated out, which should facilitate using it for a new unit-carrying
54+
dtype. Benefits of using this include speed-up (as much more will be done in
55+
C), and removal of quite tricky code to deal with, e.g., structured data
56+
types. I also have some hope of separating out the units/quantity code from
57+
astropy, so that it can be used more generally.
58+
59+
Using unit-carrying data types should also make it easier (though is not
60+
required) for Quantity to support other array classes (dask, jax, etc.), as
61+
suggested in APE 25. This will help deal with larger data sets (dask) and gain
62+
us GPU acceleration (jax). The nice things is that if Quantity is able to use
63+
other array types than ndarray, then this will nearly automatically extend to
64+
coordinates (since those use quantities almost exclusively). A bit more of an
65+
obstacle will be Time, though there a user dtype to hold the two parts of the
66+
JD (or indeed a proper quad-precision float) may help similarly: make the
67+
implementation a lot cleaner, and allow other array types than ndarray. Also,
68+
once Quantity is done, it will be easy to extend it to Masked and Distribution
69+
(and possibly Variable), as those are basically container classes already.
70+
71+
I should perhaps add that the different projects can be separated relatively
72+
easily, and do not have a very obvious order. Hence, I can give priority to
73+
whatever is deemed most important.
74+
75+
### Approximate Budget
76+
77+
I request funding to replace salary equivalent to one day a week, reducing my
78+
regular employment at the University of Toronto correspondingly. At a
79+
standard rate of USD 150/hour (which happens to be roughly my current salary)
80+
for 8 hours per week and 45 weeks, this corresponds to USD $54000 per year.
81+
82+
Note: so far I have done nothing beyond asking my Chair whether a reduction,
83+
including of teaching, might be possible in principle. I will ask for details
84+
if this proposal is deemed interesting.
85+
86+
### Period of Performance
87+
88+
Ideally, this would be for three years (if funding allows and my university
89+
agrees), but the projects are sufficiently separable that a shorter term or
90+
one split in, say, half-year parts is useful too (but not fewer hours per
91+
week, since the goal is to have full days for astropy development only).

0 commit comments

Comments
 (0)