Skip to content

Conversation

ldlin1
Copy link

@ldlin1 ldlin1 commented Dec 9, 2024

Using logical operators (e.g., |, &) on non-boolean data, where this data should be cast to bool, works for most types (e.g., float, strings). However, these operations fail with pyarrow-backed strings and numpy-backed strings.

This PR fixes the issues with pyarrow-backed string arrays by casting them into boolean arrays when they are used with logical operators. The newly implemented helper functions convert_string_to_boolean_array and cast_for_logical perform the casting, while the ARROW_LOGICAL_FUNCS dictionary has been modified to use these helper functions in the process of performing logical operations (see pandas/core/arrays/arrow/array.py).

This PR fixes the issues with numpy-backed string arrays by casting them into boolean arrays whenever they are used with boolean arrays in logical operations. This is done in the logical_op function (see pandas/core/ops/array_ops.py).

@ldlin1 ldlin1 changed the title BUG: Fix pyarrow logical bug concerning bool and string BUG: Fix pyarrow and numpy logical bug concerning bool and string Dec 9, 2024
@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

@mroeschke mroeschke closed this Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG (string dtype): logical operation with bool and string failing

3 participants