You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#### Reference Issues/PRs
<!--Example: Fixes#1234. See also #3456.-->
#### What does this implement or fix?
Adds a frontend API for using Arrow. The API looks like below:
```
from arcticdb import Arctic, OutputFormat
ac = adb.Arctic(uri, output_format=OutputFormat.EXPERIMENTAL_ARROW) # Sets a runtime output format option so all read operations return arrow tables.
lib = ac["lib_name"]
lib.read(sym).data # This will return pyarrow.Table
lib.read(sym, output_format=OutputFormat.PANDAS) # We can also override the output format for any specific read operation
```
All read operations `read`, `read_batch`, `read_batch_and_join` and
their `lazy` equivalents adhere to the `output_format` argument.
The changes in this PR are:
- Change the `_output_format` argument to `output_format`
- Separates the internal C++ `OutputFormat` from the python one. For
python we use a `StrEnum` which allows users to pass both the enum and
the string value
- Adds a `RuntimeOptions` class stored inside the `Arctic` and
`NativeVersionStore` instances. `RuntimeOptions` contains only the
`output_format` currently but will later on include things like
`arrow_string_column_encoding` and other layout configurations.
- Allow passing `output_format` in all V2 APIs and make it work for
`lazy=True` cases
- Run all query builder tests also with `output_format=ARROW`
- Clean up `test_arrow` and `test_arrow_normalization` to adhere to new
API
- Add tests for all user facing arrow APIs in `test_arrow_api.py`
- Fixes an issue where `read_batch_and_join` didn't respect the input
`ReadOptions`
- Introduces the `arcticdb.dependencies` for handling optional
dependencies.
#### Any other comments?
Writing tests for the optional dependencies is difficult. So I ran a
manual test:
In a venv with pyarrow:
```
>>> import arcticdb as adb
>>> ac = adb.Arctic("lmdb:///tmp/test-arrow")
>>> lib = ac["test"]
>>> lib.read("test", output_format="pandas").data
x
0 5
1 6
2 7
>>> lib.read("test", output_format="experimental_arrow").data
pyarrow.Table
x: int64
----
x: [[5,6,7]]
```
And in a venv without pyarrow:
```
>>> import arcticdb as adb
>>> ac = adb.Arctic("lmdb:///tmp/test-arrow")
>>> lib = ac["test"]
>>> lib.read("test", output_format="pandas").data
x
0 5
1 6
2 7
>>> lib.read("test", output_format="experimental_arrow").data
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ivo/source/read_as_arrow/python/arcticdb/version_store/library.py", line 1887, in read
return self._nvs.read(
^^^^^^^^^^^^^^^
File "/home/ivo/source/read_as_arrow/python/arcticdb/version_store/_store.py", line 2063, in read
version_query, read_options, read_query = self._get_queries(
^^^^^^^^^^^^^^^^^^
File "/home/ivo/source/read_as_arrow/python/arcticdb/version_store/_store.py", line 1970, in _get_queries
read_options = self._get_read_options(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ivo/source/read_as_arrow/python/arcticdb/version_store/_store.py", line 1958, in _get_read_options
output_format_to_internal(
File "/home/ivo/source/read_as_arrow/python/arcticdb/options.py", line 162, in output_format_to_internal
raise ModuleNotFoundError("ArcticDB's pyarrow optional dependency missing but is required to use arrow output format.")
ModuleNotFoundError: ArcticDB's pyarrow optional dependency missing but is required to use arrow output format.
```
#### Checklist
<details>
<summary>
Checklist for code changes...
</summary>
- [ ] Have you updated the relevant docstrings, documentation and
copyright notice?
- [ ] Is this contribution tested against [all ArcticDB's
features](../docs/mkdocs/docs/technical/contributing.md)?
- [ ] Do all exceptions introduced raise appropriate [error
messages](https://docs.arcticdb.io/error_messages/)?
- [ ] Are API changes highlighted in the PR description?
- [ ] Is the PR labelled as enhancement or bug so it appears in
autogenerated release notes?
</details>
<!--
Thanks for contributing a Pull Request to ArcticDB! Please ensure you
have taken a look at:
- ArcticDB's Code of Conduct:
https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md
- ArcticDB's Contribution Licensing:
https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing
-->
Copy file name to clipboardExpand all lines: python/arcticdb/options.py
+32-1Lines changed: 32 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,13 @@
6
6
As of the Change Date specified in that file, in accordance with the Business Source License, use of this software will be governed by the Apache License, version 2.0.
0 commit comments