Skip to content

Commit 25d33c2

Browse files
authored
Workaround Arrow Table Incorrect Boolean Results (#1167)
* `TILEDB_BOOL` is represented as a `uint8_t` whereas Arrow's Boolean type is 1 bit * Use Arrow's uint8 type in `tiledb_buffer_arrow_fmt` for `TILEDB_BOOL` * Cast Boolean types in the resulting Arrow table in `_run_query` from uint8 to bool
1 parent ca14d35 commit 25d33c2

File tree

3 files changed

+16
-1
lines changed

3 files changed

+16
-1
lines changed

HISTORY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010

1111
## Bug Fixes
1212
* Fix error where passing a `Context` to `Group` would segfault intermittenly [#1165](https://github.com/TileDB-Inc/TileDB-Py/pull/1165)
13+
* Correct Boolean values when `use_arrow=True` [#1167](https://github.com/TileDB-Inc/TileDB-Py/pull/1167)
1314

1415
# TileDB-Py 0.15.3 Release Notes
1516

tiledb/multirange_indexing.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -345,8 +345,21 @@ def _run_query(self) -> Union[DataFrame, Table]:
345345
elif self.use_arrow:
346346
with timing("buffer_conversion_time"):
347347
table = self.pyquery._buffers_to_pa_table()
348+
349+
# this is a workaround to cast TILEDB_BOOL types from uint8
350+
# representation in Arrow to Boolean
351+
schema = table.schema
352+
for n in range(self.array.nattr):
353+
attr = self.array.attr(n)
354+
if attr.dtype == bool:
355+
field_idx = schema.get_field_index(attr.name)
356+
field = pyarrow.field(attr.name, pyarrow.bool_())
357+
schema = schema.set(field_idx, field)
358+
table = table.cast(schema)
359+
348360
if self.query.return_arrow:
349361
return table
362+
350363
df = table.to_pandas()
351364
else:
352365
df = DataFrame(_get_pyquery_results(self.pyquery, self.array.schema))

tiledb/py_arrow_io_impl.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,8 +229,9 @@ ArrowInfo tiledb_buffer_arrow_fmt(BufferInfo bufferinfo, bool use_list = true) {
229229
return ArrowInfo("tsn:");
230230

231231
#if TILEDB_VERSION_MAJOR >= 2 && TILEDB_VERSION_MINOR >= 10
232+
// TILEDB_BOOL is stored as a uint8_t but arrow::Type::BOOL is 1 bit
232233
case TILEDB_BOOL:
233-
return ArrowInfo("b");
234+
return ArrowInfo("C");
234235
#endif
235236

236237
// TODO: these could potentially be rep'd w/ additional

0 commit comments

Comments
 (0)