|
| 1 | +--- |
| 2 | +title: "`uarray`: A Generic Override Framework for Methods" |
| 3 | +author: hameer-abbasi |
| 4 | +published: April 30, 2019 |
| 5 | +description: 'The problem is, stated simply: How do we use all of the PyData libraries in tandem, moving seamlessly from one to the other, without actually changing the API, or even the imports?' |
| 6 | +category: [PyData ecosystem] |
| 7 | +featuredImage: |
| 8 | + src: /posts/hello-world-post/blog_hero_var2.svg |
| 9 | + alt: 'An illustration of a brown and a white hand coming towards each other to pass a business card with the logo of Quansight Labs.' |
| 10 | +hero: |
| 11 | + imageSrc: /posts/hello-world-post/blog_feature_org/svg |
| 12 | + imageAlt: 'An illustration of a brown hand holding up a microphone, with some graphical elements highlighting the top of the microphone.' |
| 13 | +--- |
| 14 | + |
| 15 | +`uarray` is an override framework for methods in Python. In the |
| 16 | +scientific Python ecosystem, and in other similar places, there has been |
| 17 | +one recurring problem: That similar tools to do a job have existed, but |
| 18 | +don't conform to a single, well-defined API. `uarray` tries to solve |
| 19 | +this problem in general, but also for the scientific Python ecosystem in |
| 20 | +particular, by defining APIs independent of their implementations. |
| 21 | + |
| 22 | +## Array Libraries in the Scientific Python Ecosystem |
| 23 | + |
| 24 | +When SciPy was created, and Numeric and Numarray unified into NumPy, it |
| 25 | +jump-started Python's data science community. The ecosystem grew |
| 26 | +quickly: Academics started moving to SciPy, and the Scikits that popped |
| 27 | +up made the transition all the more smooth. |
| 28 | + |
| 29 | +However, the scientific Python community also shifted during that time: |
| 30 | +GPUs and distributed computing emerged. Also, there were old ideas that |
| 31 | +couldn't really be used with NumPy's API, such as sparse arrays. To |
| 32 | +solve these problems, various libraries emerged: |
| 33 | + |
| 34 | +- Dask, for distributed NumPy |
| 35 | +- CuPy, for NumPy on Nvidia-branded GPUs. |
| 36 | +- PyData/Sparse, a project started to make sparse arrays conform to |
| 37 | + the NumPy API |
| 38 | +- Xnd, which extends the type system and the universal function |
| 39 | + concept found in NumPy |
| 40 | + |
| 41 | +There were yet other libraries that emerged: PyTorch, which mimics NumPy |
| 42 | +to a certain degree; TensorFlow, which defines its own API; and MXNet, |
| 43 | +which is another deep learning framework that mimics NumPy. |
| 44 | + |
| 45 | +## The Problem |
| 46 | + |
| 47 | +The problem is, stated simply: How do we use all of these libraries in |
| 48 | +tandem, moving seamlessly from one to the other, without actually |
| 49 | +changing the API, or even the imports? How do we take functions written |
| 50 | +for one library and allow it to be used by another, without, as Travis |
| 51 | +Oliphant so eloquently puts it, \"re-writing the world\"? |
| 52 | + |
| 53 | +In my mind, the goals are (stated abstractly): |
| 54 | + |
| 55 | +1. Methods that are not tied to a specific implementation. |
| 56 | + |
| 57 | +- For example `np.arange` |
| 58 | + |
| 59 | +1. Backends that implement these methods. |
| 60 | + |
| 61 | +- NumPy, Dask, PyTorch are all examples of this. |
| 62 | + |
| 63 | +1. Coercion of objects to other forms to move between backends. |
| 64 | + |
| 65 | +- This means converting a NumPy array to a Dask array, and vice versa. |
| 66 | + |
| 67 | +In addition, we wanted to be able to do this for arbitrary objects. So |
| 68 | +`dtype`s, `ufunc`s etc. should also be dispatchable and coercible. |
| 69 | + |
| 70 | +## The Solution? |
| 71 | + |
| 72 | +With that said, let's dive into `uarray`. If you're not interested in |
| 73 | +the gory details, you can jump down to |
| 74 | +`<a href="#how-to-use-it">`{=html}this section`</a>`{=html}. |
| 75 | + |
| 76 | +``` python |
| 77 | +import uarray as ua |
| 78 | + |
| 79 | +# Let's ignore this for now |
| 80 | +def myfunc_rd(a, kw, d): |
| 81 | + return a, kw |
| 82 | + |
| 83 | +# We define a multimethod |
| 84 | +@ua.create_multimethod(myfunc_rd) |
| 85 | +def myfunc(): |
| 86 | + return () # Let's also ignore this for now |
| 87 | + |
| 88 | +# Now let's define two backends! |
| 89 | +be1 = ua.Backend() |
| 90 | +be2 = ua.Backend() |
| 91 | + |
| 92 | +# And register their implementations for the method! |
| 93 | +@ua.register_implementation(myfunc, backend=be1) |
| 94 | +def myfunc_be1(): # Note that it has exactly the same signature |
| 95 | + return "Potato" |
| 96 | + |
| 97 | +@ua.register_implementation(myfunc, backend=be2) |
| 98 | +def myfunc_be2(): # Note that it has exactly the same signature |
| 99 | + return "Strawberry" |
| 100 | +``` |
| 101 | + |
| 102 | +``` python |
| 103 | +with ua.set_backend(be1): |
| 104 | + print(myfunc()) |
| 105 | +``` |
| 106 | + |
| 107 | + Potato |
| 108 | + |
| 109 | +``` python |
| 110 | +with ua.set_backend(be2): |
| 111 | + print(myfunc()) |
| 112 | +``` |
| 113 | + |
| 114 | + Strawberry |
| 115 | + |
| 116 | +As we can clearly see: We have already provided a way to do (1) and (2) |
| 117 | +above. But then we run across the problem: How do we decide between |
| 118 | +these backends? How do we move between them? Let's go ahead and |
| 119 | +register both of these backends for permanent use. And see what happens |
| 120 | +when we want to implement both of their methods! |
| 121 | + |
| 122 | +``` python |
| 123 | +ua.register_backend(be1) |
| 124 | +ua.register_backend(be2) |
| 125 | +``` |
| 126 | + |
| 127 | +``` python |
| 128 | +print(myfunc()) |
| 129 | +``` |
| 130 | + |
| 131 | + Potato |
| 132 | + |
| 133 | +As we see, we get only the first backend's answer. In general, it's |
| 134 | +indeterminate what backend will be selected. But, this is a special |
| 135 | +case: We're not passing arguments in! What if we change one of these to |
| 136 | +return `NotImplemented`? |
| 137 | + |
| 138 | +``` python |
| 139 | +# We redefine the multimethod so it's new again |
| 140 | +@ua.create_multimethod(myfunc_rd) |
| 141 | +def myfunc(): |
| 142 | + return () |
| 143 | + |
| 144 | +# Now let's redefine the two backends! |
| 145 | +be1 = ua.Backend() |
| 146 | +be2 = ua.Backend() |
| 147 | + |
| 148 | +# And register their implementations for the method! |
| 149 | +@ua.register_implementation(myfunc, backend=be1) |
| 150 | +def myfunc_be1(): # Note that it has exactly the same signature |
| 151 | + return NotImplemented |
| 152 | + |
| 153 | +@ua.register_implementation(myfunc, backend=be2) |
| 154 | +def myfunc_be2(): # Note that it has exactly the same signature |
| 155 | + return "Strawberry" |
| 156 | + |
| 157 | +ua.register_backend(be1) |
| 158 | +ua.register_backend(be2) |
| 159 | +``` |
| 160 | + |
| 161 | +``` python |
| 162 | +with ua.set_backend(be1): |
| 163 | + print(myfunc()) |
| 164 | +``` |
| 165 | + |
| 166 | + Strawberry |
| 167 | + |
| 168 | +Wait\... What? Didn't we just set the first `Backend`? Ahh, but, you |
| 169 | +see\... It's signalling that it has *no* implementation for `myfunc`. |
| 170 | +The same would happen if you simply didn't register one. To force a |
| 171 | +`Backend`, we must use `only=True` or `coerce=True`, the difference will |
| 172 | +be explained in just a moment. |
| 173 | + |
| 174 | +``` python |
| 175 | +with ua.set_backend(be1, only=True): |
| 176 | + print(myfunc()) |
| 177 | +``` |
| 178 | + |
| 179 | + --------------------------------------------------------------------------- |
| 180 | + BackendNotImplementedError Traceback (most recent call last) |
| 181 | + <ipython-input-8-ec856cf7c88b> in <module> |
| 182 | + 1 with ua.set_backend(be1, only=True): |
| 183 | + ----> 2 print(myfunc()) |
| 184 | + |
| 185 | + ~/Quansight/uarray/uarray/backend.py in __call__(self, *args, **kwargs) |
| 186 | + 108 |
| 187 | + 109 if result is NotImplemented: |
| 188 | + --> 110 raise BackendNotImplementedError('No selected backends had an implementation for this method.') |
| 189 | + 111 |
| 190 | + 112 return result |
| 191 | + |
| 192 | + BackendNotImplementedError: No selected backends had an implementation for this method. |
| 193 | + |
| 194 | +Now we are told that no backends had an implementation for this function |
| 195 | +(which is nice, good error messages are nice!) |
| 196 | + |
| 197 | +## Coercion and passing between backends |
| 198 | + |
| 199 | +Let's say we had two `Backend`s. Let's choose the completely useless |
| 200 | +example of one storing a number as an `int` and one as a `float`. |
| 201 | + |
| 202 | +``` python |
| 203 | +class Number(ua.DispatchableInstance): |
| 204 | + pass |
| 205 | + |
| 206 | +def myfunc_rd(args, kwargs, dispatchable_args): |
| 207 | + # Here, we're "replacing" the dispatchable args with the ones supplied. |
| 208 | + # In general, this may be more complex, like inserting them in between |
| 209 | + # other args and kwargs. |
| 210 | + return dispatchable_args, kwargs |
| 211 | + |
| 212 | +@ua.create_multimethod(myfunc_rd) |
| 213 | +def myfunc(a): |
| 214 | + # Here, we're marking a as a Number, and saying that "we want to dispatch/convert over this" |
| 215 | + # We return as a tuple as there may be more dispatchable arguments |
| 216 | + return (Number(a),) |
| 217 | + |
| 218 | +Number.register_convertor(be1, lambda x: int(x)) |
| 219 | +Number.register_convertor(be2, lambda x: str(x)) |
| 220 | +``` |
| 221 | + |
| 222 | +Let's also define a \"catch-all\" method. This catches all |
| 223 | +implementations of methods not already registered. |
| 224 | + |
| 225 | +``` python |
| 226 | +# This can be arbitrarily complex |
| 227 | +def gen_impl1(method, args, kwargs, dispatchable_args): |
| 228 | + if not all(isinstance(a, Number) and isinstance(a.value, int) for a in dispatchable_args): |
| 229 | + return NotImplemented |
| 230 | + |
| 231 | + return args[0] |
| 232 | + |
| 233 | +# This can be arbitrarily complex |
| 234 | +def gen_impl2(method, args, kwargs, dispatchable_args): |
| 235 | + if not all(isinstance(a, Number) and isinstance(a.value, str) for a in dispatchable_args): |
| 236 | + return NotImplemented |
| 237 | + |
| 238 | + return args[0] |
| 239 | + |
| 240 | +be1.register_implementation(None, gen_impl1) |
| 241 | +be2.register_implementation(None, gen_impl2) |
| 242 | +``` |
| 243 | + |
| 244 | +``` python |
| 245 | +myfunc('1') # This calls the second implementation |
| 246 | +``` |
| 247 | + |
| 248 | + '1' |
| 249 | + |
| 250 | +``` python |
| 251 | +myfunc(1) # This calls the first implementation |
| 252 | +``` |
| 253 | + |
| 254 | + 1 |
| 255 | + |
| 256 | +``` python |
| 257 | +myfunc(1.0) # This fails |
| 258 | +``` |
| 259 | + |
| 260 | + --------------------------------------------------------------------------- |
| 261 | + BackendNotImplementedError Traceback (most recent call last) |
| 262 | + <ipython-input-13-8431c1275db5> in <module> |
| 263 | + ----> 1 myfunc(1.0) # This fails |
| 264 | + |
| 265 | + ~/Quansight/uarray/uarray/backend.py in __call__(self, *args, **kwargs) |
| 266 | + 108 |
| 267 | + 109 if result is NotImplemented: |
| 268 | + --> 110 raise BackendNotImplementedError('No selected backends had an implementation for this method.') |
| 269 | + 111 |
| 270 | + 112 return result |
| 271 | + |
| 272 | + BackendNotImplementedError: No selected backends had an implementation for this method. |
| 273 | + |
| 274 | +``` python |
| 275 | +# But works if we do this: |
| 276 | + |
| 277 | +with ua.set_backend(be1, coerce=True): |
| 278 | + print(type(myfunc(1.0))) |
| 279 | + |
| 280 | +with ua.set_backend(be2, coerce=True): |
| 281 | + print(type(myfunc(1.0))) |
| 282 | +``` |
| 283 | + |
| 284 | + <class 'int'> |
| 285 | + <class 'str'> |
| 286 | + |
| 287 | +This may seem like too much work, but remember that it's broken down |
| 288 | +into a lot of small steps: |
| 289 | + |
| 290 | +1. Extract the dispatchable arguments. |
| 291 | +2. Realise the types of the dispatchable arguments. |
| 292 | +3. Convert them. |
| 293 | +4. Place them back into args/kwargs |
| 294 | +5. Call the right function. |
| 295 | + |
| 296 | +Note that `only=True` does not coerce, just enforces the backend |
| 297 | +strictly. |
| 298 | + |
| 299 | +With this, we have solved problem (3). Now remains the grunt-work of |
| 300 | +actually retrofitting the NumPy API into `unumpy` and extracting the |
| 301 | +right values from it. |
| 302 | + |
| 303 | +## How To Use It Today |
| 304 | + |
| 305 | +`unumpy` is a set of NumPy-related multimethods built on top of |
| 306 | +`uarray`. You can use them as follows: |
| 307 | + |
| 308 | +``` python |
| 309 | +import unumpy as np # Note the changed import statement |
| 310 | +from unumpy.xnd_backend import XndBackend |
| 311 | + |
| 312 | +with ua.set_backend(XndBackend): |
| 313 | + print(type(np.arange(0, 100, 1))) |
| 314 | +``` |
| 315 | + |
| 316 | + <class 'xnd.array'> |
| 317 | + |
| 318 | +And, as you can see, we get back an Xnd array when using a NumPy-like |
| 319 | +API. Currently, there are three back-ends: NumPy, Xnd and PyTorch. The |
| 320 | +NumPy and Xnd backends have feature parity, while the PyTorch backend is |
| 321 | +still being worked on. |
| 322 | + |
| 323 | +We are also working on supporting more of the NumPy API, and dispatching |
| 324 | +over dtypes. |
| 325 | + |
| 326 | +Feel free to browse the source and open issues at: |
| 327 | +<https://github.com/Quansight-Labs/uarray> or shoot me an email at |
| 328 | +`<a href="mailto:[email protected]">`{=html} [email protected]`</a>`{=html} |
| 329 | +if you want to contact me directly. You can also find the full |
| 330 | +documentation at <https://uarray.readthedocs.io/en/latest/>. |
| 331 | + |
0 commit comments