|
13 | 13 | just release |
14 | 14 | ``` |
15 | 15 |
|
16 | | -## Execution Models |
17 | | -### Embedded VM |
18 | | -#### Pyodide |
19 | | -Website: <https://pyodide.org/>. |
| 16 | +## Python Version |
| 17 | +We currently bundle [Python 3.14.0rc2]. |
| 18 | + |
| 19 | +## Python Standard Library |
| 20 | +In contrast to a normal Python installation there are a few notable public[^public] modules **missing** from the [Python Standard Library]: |
| 21 | + |
| 22 | +- [`curses`](https://docs.python.org/3/library/curses.html) |
| 23 | +- [`ensurepip`](https://docs.python.org/3/library/ensurepip.html) |
| 24 | +- [`fcntl`](https://docs.python.org/3/library/fcntl.html) |
| 25 | +- [`grp`](https://docs.python.org/3/library/grp.html) |
| 26 | +- [`idlelib`](https://docs.python.org/3/library/idle.html) |
| 27 | +- [`mmap`](https://docs.python.org/3/library/mmap.html) |
| 28 | +- [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html) |
| 29 | +- [`pip`](https://pip.pypa.io/) |
| 30 | +- [`pwd`](https://docs.python.org/3/library/pwd.html) |
| 31 | +- [`readline`](https://docs.python.org/3/library/readline.html) |
| 32 | +- [`resource`](https://docs.python.org/3/library/resource.html) |
| 33 | +- [`syslog`](https://docs.python.org/3/library/syslog.html) |
| 34 | +- [`termios`](https://docs.python.org/3/library/termios.html) |
| 35 | +- [`tkinter`](https://docs.python.org/3/library/tkinter.html) |
| 36 | +- [`turtledemo`](https://docs.python.org/3/library/turtle.html#module-turtledemo) |
| 37 | +- [`venv`](https://docs.python.org/3/library/venv.html) |
| 38 | +- [`zlib`](https://docs.python.org/3/library/zlib.html) |
| 39 | + |
| 40 | +Some modules low level modules like [`os`](https://docs.python.org/3/library/os.html) may not offer all methods, types, and constants. |
| 41 | + |
| 42 | +## Dependencies |
| 43 | +We do not bundle any additional libraries at the moment. It is currently NOT possible to install your own dependencies. |
| 44 | + |
| 45 | +## Methods |
| 46 | +Currently we only support [Scalar UDF]s. One can write it using a simple Python function: |
| 47 | + |
| 48 | +```python |
| 49 | +def add_one(x: int) -> int: |
| 50 | + return x + 1 |
| 51 | +``` |
| 52 | + |
| 53 | +You may register multiple methods in one Python source text. Imported methods and private methods starting with `_` are ignored. |
| 54 | + |
| 55 | +## Types |
| 56 | +Types are mapped to/from [Apache Arrow] as follows: |
| 57 | + |
| 58 | +| Python | Arrow | |
| 59 | +| ------------ | ----------- | |
| 60 | +| [`bool`] | [`Boolean`] | |
| 61 | +| [`datetime`] | [`Timestamp`] w/ [`Microsecond`] and NO timezone | |
| 62 | +| [`float`] | [`Float64`] | |
| 63 | +| [`int`] | [`Int64`] | |
| 64 | +| [`str`] | [`Utf8`] | |
| 65 | + |
| 66 | +Additional types may be supported in the future. |
| 67 | + |
| 68 | +## NULLs |
| 69 | +NULLs are rather common in database contexts and a first-class citizen in [Apache Arrow] and [Apache DataFusion]. If you do not want to deal with it, just define your method with simple scalar types and we will skip NULL rows for you: |
| 70 | + |
| 71 | +```python |
| 72 | +def add_simple(x: int, y: int) -> int: |
| 73 | + return x + y |
| 74 | +``` |
20 | 75 |
|
21 | | -Pros: |
22 | | -- Supports loads of dependencies |
23 | | -- Runs in the browser |
| 76 | +However, you can opt into full NULL handling. In Python, NULLs are expressed as optionals: |
24 | 77 |
|
25 | | -Cons: |
26 | | -- Doesn't seem to be working with freestanding WASM runtimes / servers, esp. not without Node.js |
| 78 | +```python |
| 79 | +def add_nulls(x: int | None, y: int | None) -> int | None: |
| 80 | + if x is None or y is None: |
| 81 | + return None |
| 82 | + return x + y |
| 83 | +``` |
27 | 84 |
|
28 | | -#### Official CPython WASM Builds |
29 | | -Links: |
30 | | -- <https://github.com/python/cpython/tree/main/Tools/wasm> |
31 | | -- <https://devguide.python.org/getting-started/setup-building/#wasi> |
32 | | -- <https://github.com/psf/webassembly> |
33 | | -- <https://github.com/brettcannon/cpython-wasi-build/releases> |
| 85 | +or via the older syntax: |
34 | 86 |
|
35 | | -Pros: |
36 | | -- Official project, so it has a somewhat stable future and it is easier to get buy-in from the community |
| 87 | +```python |
| 88 | +from typing import Optional |
37 | 89 |
|
38 | | -Cons: |
39 | | -- Can only run as a WASI CLI-like app (so we would need to interact with it via stdio or a fake network) |
40 | | -- Currently only offered as wasip1 |
| 90 | +def add_old(x: Optional[int], y: Optional[int]) -> Optional[int]: |
| 91 | + if x is None or y is None: |
| 92 | + return None |
| 93 | + return x + y |
| 94 | +``` |
41 | 95 |
|
42 | | -#### pyo3 + Official CPython WASM Builds |
43 | | -Instead of using stdio to drive a Python interpreter, we use [pyo3]. |
| 96 | +You may also partially opt into NULL handling for one parameter: |
44 | 97 |
|
45 | | -Pros: |
46 | | -- We can interact w/ Python more efficiently. |
| 98 | +```python |
| 99 | +def add_left(x: int | None, y: int) -> int | None: |
| 100 | + if x is None: |
| 101 | + return None |
| 102 | + return x + y |
47 | 103 |
|
48 | | -Cons: |
49 | | -- Needs pre-released Python 3.14, because 3.13 seems to rely on "thread parking", which is implemented as WASM exceptions, which are not supported by wasmtime yet. Relevant code is <https://github.com/PyO3/pyo3/blob/52554ce0a33321893af17577a3ea0d179ad1b563/pyo3-ffi/src/pystate.rs#L87-L94>. |
| 104 | +def add_right(x: int, y: int | None) -> int | None: |
| 105 | + if y is None: |
| 106 | + return None |
| 107 | + return x + y |
| 108 | +``` |
50 | 109 |
|
51 | | -#### webassembly-language-runtimes |
52 | | -Website: <https://github.com/webassemblylabs/webassembly-language-runtimes> |
| 110 | +Note that if you define the return type as non-optional, you MUST NOT return `None`. Otherwise, the execution will fail. |
53 | 111 |
|
54 | | -This was formally a VMWare project. |
| 112 | +To give you a better idea when a Python method is called, consult this table: |
55 | 113 |
|
56 | | -Cons: |
57 | | -- Seems dead? |
| 114 | +| `x` | `y` | `add_simple` | `add_nulls` | `add_left` | `add_right` | |
| 115 | +| ------ | ------ | ------------ | ----------- | ---------- | ----------- | |
| 116 | +| `None` | `None` | 𐄂 | ✓ | 𐄂 | 𐄂 | |
| 117 | +| `None` | some | 𐄂 | ✓ | ✓ | 𐄂 | |
| 118 | +| some | `None` | 𐄂 | ✓ | 𐄂 | ✓ | |
| 119 | +| some | some | ✓ | ✓ | ✓ | ✓ | |
58 | 120 |
|
59 | | -### Ahead-of-Time Compilation |
60 | | -This is only going to work if |
| 121 | +You may find this feature helpful when you want to control default values for NULLs: |
61 | 122 |
|
62 | | -- the ahead-of-time compiler itself is lightweight enough to be embedded within a database (esp. it should not call to some random C host toolchain) |
63 | | -- the Python compiler/transpiler is solid and supports enough features |
| 123 | +```python |
| 124 | +def half(x: float | None) -> float: |
| 125 | + # zero might be a sensible default |
| 126 | + if x is None: |
| 127 | + return 0.0 |
64 | 128 |
|
65 | | -#### componentize-py |
66 | | -Website: <https://github.com/bytecodealliance/componentize-py> |
| 129 | + return x / 2.0 |
| 130 | +``` |
| 131 | + |
| 132 | +or if you want turn a value into NULLs: |
67 | 133 |
|
68 | | -#### py2wasm |
69 | | -Website: <https://github.com/wasmerio/py2wasm> |
| 134 | +```python |
| 135 | +def add_one_limited(x: int) -> int | None: |
| 136 | + # do not go beyond 100 |
| 137 | + if x >= 100: |
| 138 | + return None |
70 | 139 |
|
71 | | -### Other Notes |
72 | | -- <https://wasmlabs.dev/articles/python-wasm-rust/> |
| 140 | + return x + 1 |
| 141 | +``` |
73 | 142 |
|
| 143 | +## Default Parameters and Kwargs |
| 144 | +Default parameters, `*args`, and `**kwargs` are currently NOT supported. So these method will be rejected: |
| 145 | + |
| 146 | +```python |
| 147 | +def m1(x: int = 1) -> int: |
| 148 | + return x + 1 |
| 149 | + |
| 150 | +def m2(*x: int) -> int: |
| 151 | + return x + 1 |
| 152 | + |
| 153 | +def m3(*, x: int) -> int: |
| 154 | + return x + 1 |
| 155 | + |
| 156 | +def m4(**x: int) -> int: |
| 157 | + return x + 1 |
| 158 | +``` |
| 159 | + |
| 160 | +## State |
| 161 | +We give no guarantees on the lifetime of the Python VM, but you may use state in your Python methods for performance reasons (e.g. to cache results): |
| 162 | + |
| 163 | +```python |
| 164 | +_cache = {} |
| 165 | + |
| 166 | +def compute(x: int) -> int: |
| 167 | + try: |
| 168 | + return _cache[x] |
| 169 | + except ValueError: |
| 170 | + y = x * 100 |
| 171 | + _cache[x] = y |
| 172 | + return x |
| 173 | +``` |
| 174 | + |
| 175 | +You may also use a builtin solution like [`functools.cache`]: |
| 176 | + |
| 177 | +```python |
| 178 | +from functools import cache |
| 179 | + |
| 180 | +@cache |
| 181 | +def compute(x: int) -> int: |
| 182 | + return x * 100 |
| 183 | +``` |
74 | 184 |
|
75 | | -[pyo3]: https://pyo3.rs/ |
| 185 | +## I/O |
| 186 | +There is NO I/O available that escapes the sandbox. The [Python Standard Library] is mounted as a read-only filesystem. |
| 187 | + |
| 188 | + |
| 189 | +[^public]: Modules not starting with a `_`. |
| 190 | + |
| 191 | +[Apache Arrow]: https://arrow.apache.org/ |
| 192 | +[Apache DataFusion]: https://datafusion.apache.org/ |
| 193 | +[`bool`]: https://docs.python.org/3/library/stdtypes.html#boolean-type-bool |
| 194 | +[`Boolean`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Boolean |
| 195 | +[`datetime`]: https://docs.python.org/3/library/datetime.html#datetime.datetime |
| 196 | +[`float`]: https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex |
| 197 | +[`Float64`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Float64 |
| 198 | +[`functools.cache`]: https://docs.python.org/3/library/functools.html#functools.cache |
| 199 | +[`int`]: https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex |
| 200 | +[`Int64`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Int64 |
| 201 | +[`Microsecond`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.TimeUnit.html#variant.Microsecond |
| 202 | +[Python 3.14.0rc2]: https://www.python.org/downloads/release/python-3140rc2/ |
| 203 | +[Python Standard Library]: https://docs.python.org/3/library/index.html |
| 204 | +[Scalar UDF]: https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.ScalarUDF.html |
| 205 | +[`str`]: https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str |
| 206 | +[`Timestamp`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Timestamp |
| 207 | +[`Utf8`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Utf8 |
0 commit comments