Skip to content

Commit 1a0057e

Browse files
authored
Merge pull request #78 from influxdata/crepererum/document_python_features
docs: explain Python guest features
2 parents c74e676 + 315dc86 commit 1a0057e

File tree

3 files changed

+269
-42
lines changed

3 files changed

+269
-42
lines changed

guests/python/DEVELOPMENT.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Development Notes
2+
## Execution Models
3+
### Embedded VM
4+
#### Pyodide
5+
Website: <https://pyodide.org/>.
6+
7+
Pros:
8+
- Supports loads of dependencies
9+
- Runs in the browser
10+
11+
Cons:
12+
- Doesn't seem to be working with freestanding WASM runtimes / servers, esp. not without Node.js
13+
14+
#### Official CPython WASM Builds
15+
Links:
16+
- <https://github.com/python/cpython/tree/main/Tools/wasm>
17+
- <https://devguide.python.org/getting-started/setup-building/#wasi>
18+
- <https://github.com/psf/webassembly>
19+
- <https://github.com/brettcannon/cpython-wasi-build/releases>
20+
21+
Pros:
22+
- Official project, so it has a somewhat stable future and it is easier to get buy-in from the community
23+
24+
Cons:
25+
- Can only run as a WASI CLI-like app (so we would need to interact with it via stdio or a fake network)
26+
- Currently only offered as wasip1
27+
28+
#### pyo3 + Official CPython WASM Builds
29+
Instead of using stdio to drive a Python interpreter, we use [pyo3].
30+
31+
Pros:
32+
- We can interact w/ Python more efficiently.
33+
34+
Cons:
35+
- Needs pre-released Python 3.14, because 3.13 seems to rely on "thread parking", which is implemented as WASM exceptions, which are not supported by wasmtime yet. Relevant code is <https://github.com/PyO3/pyo3/blob/52554ce0a33321893af17577a3ea0d179ad1b563/pyo3-ffi/src/pystate.rs#L87-L94>.
36+
37+
#### webassembly-language-runtimes
38+
Website: <https://github.com/webassemblylabs/webassembly-language-runtimes>
39+
40+
This was formally a VMWare project.
41+
42+
Cons:
43+
- Seems dead?
44+
45+
### Ahead-of-Time Compilation
46+
This is only going to work if
47+
48+
- the ahead-of-time compiler itself is lightweight enough to be embedded within a database (esp. it should not call to some random C host toolchain)
49+
- the Python compiler/transpiler is solid and supports enough features
50+
51+
#### componentize-py
52+
Website: <https://github.com/bytecodealliance/componentize-py>
53+
54+
#### py2wasm
55+
Website: <https://github.com/wasmerio/py2wasm>
56+
57+
### Other Notes
58+
- <https://wasmlabs.dev/articles/python-wasm-rust/>
59+
60+
61+
[pyo3]: https://pyo3.rs/

guests/python/README.md

Lines changed: 174 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -13,63 +13,195 @@ or
1313
just release
1414
```
1515

16-
## Execution Models
17-
### Embedded VM
18-
#### Pyodide
19-
Website: <https://pyodide.org/>.
16+
## Python Version
17+
We currently bundle [Python 3.14.0rc2].
18+
19+
## Python Standard Library
20+
In contrast to a normal Python installation there are a few notable public[^public] modules **missing** from the [Python Standard Library]:
21+
22+
- [`curses`](https://docs.python.org/3/library/curses.html)
23+
- [`ensurepip`](https://docs.python.org/3/library/ensurepip.html)
24+
- [`fcntl`](https://docs.python.org/3/library/fcntl.html)
25+
- [`grp`](https://docs.python.org/3/library/grp.html)
26+
- [`idlelib`](https://docs.python.org/3/library/idle.html)
27+
- [`mmap`](https://docs.python.org/3/library/mmap.html)
28+
- [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html)
29+
- [`pip`](https://pip.pypa.io/)
30+
- [`pwd`](https://docs.python.org/3/library/pwd.html)
31+
- [`readline`](https://docs.python.org/3/library/readline.html)
32+
- [`resource`](https://docs.python.org/3/library/resource.html)
33+
- [`syslog`](https://docs.python.org/3/library/syslog.html)
34+
- [`termios`](https://docs.python.org/3/library/termios.html)
35+
- [`tkinter`](https://docs.python.org/3/library/tkinter.html)
36+
- [`turtledemo`](https://docs.python.org/3/library/turtle.html#module-turtledemo)
37+
- [`venv`](https://docs.python.org/3/library/venv.html)
38+
- [`zlib`](https://docs.python.org/3/library/zlib.html)
39+
40+
Some modules low level modules like [`os`](https://docs.python.org/3/library/os.html) may not offer all methods, types, and constants.
41+
42+
## Dependencies
43+
We do not bundle any additional libraries at the moment. It is currently NOT possible to install your own dependencies.
44+
45+
## Methods
46+
Currently we only support [Scalar UDF]s. One can write it using a simple Python function:
47+
48+
```python
49+
def add_one(x: int) -> int:
50+
return x + 1
51+
```
52+
53+
You may register multiple methods in one Python source text. Imported methods and private methods starting with `_` are ignored.
54+
55+
## Types
56+
Types are mapped to/from [Apache Arrow] as follows:
57+
58+
| Python | Arrow |
59+
| ------------ | ----------- |
60+
| [`bool`] | [`Boolean`] |
61+
| [`datetime`] | [`Timestamp`] w/ [`Microsecond`] and NO timezone |
62+
| [`float`] | [`Float64`] |
63+
| [`int`] | [`Int64`] |
64+
| [`str`] | [`Utf8`] |
65+
66+
Additional types may be supported in the future.
67+
68+
## NULLs
69+
NULLs are rather common in database contexts and a first-class citizen in [Apache Arrow] and [Apache DataFusion]. If you do not want to deal with it, just define your method with simple scalar types and we will skip NULL rows for you:
70+
71+
```python
72+
def add_simple(x: int, y: int) -> int:
73+
return x + y
74+
```
2075

21-
Pros:
22-
- Supports loads of dependencies
23-
- Runs in the browser
76+
However, you can opt into full NULL handling. In Python, NULLs are expressed as optionals:
2477

25-
Cons:
26-
- Doesn't seem to be working with freestanding WASM runtimes / servers, esp. not without Node.js
78+
```python
79+
def add_nulls(x: int | None, y: int | None) -> int | None:
80+
if x is None or y is None:
81+
return None
82+
return x + y
83+
```
2784

28-
#### Official CPython WASM Builds
29-
Links:
30-
- <https://github.com/python/cpython/tree/main/Tools/wasm>
31-
- <https://devguide.python.org/getting-started/setup-building/#wasi>
32-
- <https://github.com/psf/webassembly>
33-
- <https://github.com/brettcannon/cpython-wasi-build/releases>
85+
or via the older syntax:
3486

35-
Pros:
36-
- Official project, so it has a somewhat stable future and it is easier to get buy-in from the community
87+
```python
88+
from typing import Optional
3789

38-
Cons:
39-
- Can only run as a WASI CLI-like app (so we would need to interact with it via stdio or a fake network)
40-
- Currently only offered as wasip1
90+
def add_old(x: Optional[int], y: Optional[int]) -> Optional[int]:
91+
if x is None or y is None:
92+
return None
93+
return x + y
94+
```
4195

42-
#### pyo3 + Official CPython WASM Builds
43-
Instead of using stdio to drive a Python interpreter, we use [pyo3].
96+
You may also partially opt into NULL handling for one parameter:
4497

45-
Pros:
46-
- We can interact w/ Python more efficiently.
98+
```python
99+
def add_left(x: int | None, y: int) -> int | None:
100+
if x is None:
101+
return None
102+
return x + y
47103

48-
Cons:
49-
- Needs pre-released Python 3.14, because 3.13 seems to rely on "thread parking", which is implemented as WASM exceptions, which are not supported by wasmtime yet. Relevant code is <https://github.com/PyO3/pyo3/blob/52554ce0a33321893af17577a3ea0d179ad1b563/pyo3-ffi/src/pystate.rs#L87-L94>.
104+
def add_right(x: int, y: int | None) -> int | None:
105+
if y is None:
106+
return None
107+
return x + y
108+
```
50109

51-
#### webassembly-language-runtimes
52-
Website: <https://github.com/webassemblylabs/webassembly-language-runtimes>
110+
Note that if you define the return type as non-optional, you MUST NOT return `None`. Otherwise, the execution will fail.
53111

54-
This was formally a VMWare project.
112+
To give you a better idea when a Python method is called, consult this table:
55113

56-
Cons:
57-
- Seems dead?
114+
| `x` | `y` | `add_simple` | `add_nulls` | `add_left` | `add_right` |
115+
| ------ | ------ | ------------ | ----------- | ---------- | ----------- |
116+
| `None` | `None` | 𐄂 || 𐄂 | 𐄂 |
117+
| `None` | some | 𐄂 ||| 𐄂 |
118+
| some | `None` | 𐄂 || 𐄂 ||
119+
| some | some |||||
58120

59-
### Ahead-of-Time Compilation
60-
This is only going to work if
121+
You may find this feature helpful when you want to control default values for NULLs:
61122

62-
- the ahead-of-time compiler itself is lightweight enough to be embedded within a database (esp. it should not call to some random C host toolchain)
63-
- the Python compiler/transpiler is solid and supports enough features
123+
```python
124+
def half(x: float | None) -> float:
125+
# zero might be a sensible default
126+
if x is None:
127+
return 0.0
64128

65-
#### componentize-py
66-
Website: <https://github.com/bytecodealliance/componentize-py>
129+
return x / 2.0
130+
```
131+
132+
or if you want turn a value into NULLs:
67133

68-
#### py2wasm
69-
Website: <https://github.com/wasmerio/py2wasm>
134+
```python
135+
def add_one_limited(x: int) -> int | None:
136+
# do not go beyond 100
137+
if x >= 100:
138+
return None
70139

71-
### Other Notes
72-
- <https://wasmlabs.dev/articles/python-wasm-rust/>
140+
return x + 1
141+
```
73142

143+
## Default Parameters and Kwargs
144+
Default parameters, `*args`, and `**kwargs` are currently NOT supported. So these method will be rejected:
145+
146+
```python
147+
def m1(x: int = 1) -> int:
148+
return x + 1
149+
150+
def m2(*x: int) -> int:
151+
return x + 1
152+
153+
def m3(*, x: int) -> int:
154+
return x + 1
155+
156+
def m4(**x: int) -> int:
157+
return x + 1
158+
```
159+
160+
## State
161+
We give no guarantees on the lifetime of the Python VM, but you may use state in your Python methods for performance reasons (e.g. to cache results):
162+
163+
```python
164+
_cache = {}
165+
166+
def compute(x: int) -> int:
167+
try:
168+
return _cache[x]
169+
except ValueError:
170+
y = x * 100
171+
_cache[x] = y
172+
return x
173+
```
174+
175+
You may also use a builtin solution like [`functools.cache`]:
176+
177+
```python
178+
from functools import cache
179+
180+
@cache
181+
def compute(x: int) -> int:
182+
return x * 100
183+
```
74184

75-
[pyo3]: https://pyo3.rs/
185+
## I/O
186+
There is NO I/O available that escapes the sandbox. The [Python Standard Library] is mounted as a read-only filesystem.
187+
188+
189+
[^public]: Modules not starting with a `_`.
190+
191+
[Apache Arrow]: https://arrow.apache.org/
192+
[Apache DataFusion]: https://datafusion.apache.org/
193+
[`bool`]: https://docs.python.org/3/library/stdtypes.html#boolean-type-bool
194+
[`Boolean`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Boolean
195+
[`datetime`]: https://docs.python.org/3/library/datetime.html#datetime.datetime
196+
[`float`]: https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex
197+
[`Float64`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Float64
198+
[`functools.cache`]: https://docs.python.org/3/library/functools.html#functools.cache
199+
[`int`]: https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex
200+
[`Int64`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Int64
201+
[`Microsecond`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.TimeUnit.html#variant.Microsecond
202+
[Python 3.14.0rc2]: https://www.python.org/downloads/release/python-3140rc2/
203+
[Python Standard Library]: https://docs.python.org/3/library/index.html
204+
[Scalar UDF]: https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.ScalarUDF.html
205+
[`str`]: https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str
206+
[`Timestamp`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Timestamp
207+
[`Utf8`]: https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.Utf8

host/tests/integration_tests/python/runtime/dependencies.rs

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,37 @@ def foo(x: int) -> int:
4242
&Int64Array::from_iter([Some(12), Some(23), Some(34)]) as &dyn Array,
4343
);
4444
}
45+
46+
#[tokio::test(flavor = "multi_thread")]
47+
async fn functools_cache() {
48+
const CODE: &str = "
49+
from functools import cache
50+
51+
_counter = 0
52+
53+
@cache
54+
def foo(x: int) -> int:
55+
global _counter
56+
_counter += 1
57+
return x + _counter
58+
";
59+
60+
let udf = python_scalar_udf(CODE).await.unwrap();
61+
let array = udf
62+
.invoke_with_args(ScalarFunctionArgs {
63+
args: vec![ColumnarValue::Array(Arc::new(Int64Array::from_iter([
64+
Some(10),
65+
Some(20),
66+
Some(10),
67+
])))],
68+
arg_fields: vec![Arc::new(Field::new("a1", DataType::Int64, true))],
69+
number_rows: 3,
70+
return_field: Arc::new(Field::new("r", DataType::Int64, true)),
71+
})
72+
.unwrap()
73+
.unwrap_array();
74+
assert_eq!(
75+
array.as_ref(),
76+
&Int64Array::from_iter([Some(11), Some(22), Some(11)]) as &dyn Array,
77+
);
78+
}

0 commit comments

Comments
 (0)