Skip to content

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Dec 12, 2025

PR Description: Improve Index.lookup and MultiIndex.lookup semantics

This pull request refines type-handling, error messaging, and row/column matching
behavior for both Index.contains and MultiIndex.lookup.
It also adds a regression test ensuring that mixed-dtype tuple keys do not
trigger incorrect scalar casting.
The changes improve correctness, readability, and alignment with Pandas semantics.


Summary of Changes

1. Index.lookup

File: arkouda/pandas/index.py

  • Updated the return-value docstring to correctly indicate that the result is a
    boolean pdarray of length len(self).
  • Improved the TypeError description to reflect that the function expects a
    value convertible into an Arkouda array.
  • Removed an unused import (akint64).

Motivation:
Clarifies semantics and avoids stale / unused imports.


2. MultiIndex.lookup

File: arkouda/pandas/index.py

Major improvements to validation, dtype behavior, and membership logic:

Validation

  • Rejects keys that are not list or tuple.
  • Enforces that the key length matches nlevels with a clear ValueError.

Two explicit code paths

  1. Per-level arkouda arrays (e.g. list of pdarray / Strings)
    Delegated directly to in1d(self.index, key) for vectorized matching.

  2. Scalar tuple keys (e.g., (1, "red"))

    • Scalars are wrapped into length‑1 Arkouda arrays without casting dtypes.
    • Prevents accidental coercion of string scalars into numeric types.

This behavior aligns better with Pandas and eliminates subtle dtype bugs.


3. New Test: Mixed-dtype tuple lookup

File: tests/pandas/index_test.py

Added test test_multiindex_lookup_tuple_mixed_dtypes:

  • Ensures that a scalar mixed-type key like (1, "red"):
    • Does not cast "red" into numeric types.
    • Produces correct row-level matching.
  • Verifies the mask is [True, False, False, False] for the provided example.

Motivation:
Prevents regressions and captures a real-world bug scenario.


Why This Matters

  • Fixes subtle multi-dtype matching bugs in MultiIndex.lookup.
  • Moves Arkouda’s Pandas-backed behavior closer to Pandas semantics.
  • Improves test coverage around a previously fragile API surface.
  • Supports downstream work on joins, grouping, and Index alignment.

Backward Compatibility

  • No breaking API changes.
  • Behavior is now more correct for mixed-type keys and aligns with expected user intuition.

Closes #5155: Bug: MultiIndex .lookup() attempts illegal dtype cast for tuple keys

@ajpotts ajpotts force-pushed the 5155_Bug_MultiIndex.lookup branch 3 times, most recently from 2303c7b to 11f3fe1 Compare December 17, 2025 13:32
@ajpotts ajpotts marked this pull request as ready for review December 17, 2025 14:17
@ajpotts ajpotts force-pushed the 5155_Bug_MultiIndex.lookup branch from 11f3fe1 to a05d13e Compare January 15, 2026 14:47
@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@538fb39). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #5156   +/-   ##
========================================
  Coverage        ?   100.00%           
========================================
  Files           ?         4           
  Lines           ?        63           
  Branches        ?         0           
========================================
  Hits            ?        63           
  Misses          ?         0           
  Partials        ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

assert mask.dtype == ak.bool_

# Expect exactly the first row to match
assert mask.to_ndarray().tolist() == [True, False, False, False]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I've studied this for a bit, I'd also suggest testing a lookup where the inputs are arrays. This is what I did with the above:

>>> midx.lookup(([ak.array([1,2]),ak.array(['blue','red'])]))
array([False True True False])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't mean to do this as a 'requested changes'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

@ajpotts ajpotts force-pushed the 5155_Bug_MultiIndex.lookup branch from c32ebc4 to dd535ad Compare January 16, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: MultiIndex .lookup() attempts illegal dtype cast for tuple keys

3 participants