[Data] Add map namespace support for expression operations by ryankert01 · Pull Request #59879 · ray-project/ray

ryankert01 · 2026-01-06T06:41:41Z

Description

`MapNamespace` impl.

Implemented _extract_map_component as a robust, vectorized fallback since native pc.map_keys kernels are not standard in PyArrow yet.
Support: Handles both Logical Maps (MapArray) and Physical Maps (List<Struct>).

Testing

test_map_keys / test_map_values: Standard extraction.
test_physical_map_extraction: Verifies support for List<Struct>.
test_map_sliced_offsets: Verifies the critical fix for sliced data.
test_map_nulls_and_empty: Verifies handling of None and empty maps {}.
test_map_chaining: Verifies composition with List namespace (e.g., .map.keys().list.len()).

Related issues

Related to #58674
Continues #58743

Additional information

test w/

python -m pytest -v -s python/ray/data/tests/test_namespace_expressions.py::TestMapNamespace

^{Cursor Bugbot found 1 potential issue for commit 7a11478}

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

python/ray/data/namespace_expressions/map_namespace.py

gemini-code-assist

Code Review

This pull request introduces support for map/dict operations on expression columns by adding a map namespace. The implementation is well-structured, adding a _MapNamespace with keys() and values() methods that work on both logical MapArray and physical List<Struct> representations. The handling of sliced arrays with non-zero offsets is a great detail that ensures correctness. The accompanying tests are thorough, covering various representations, edge cases like nulls and empty maps, and integration with other namespaces.

I've added a couple of suggestions to map_namespace.py to further improve the robustness of the implementation by handling LargeListArray and providing clearer errors for unsupported types. Overall, this is a solid contribution that enhances Ray Data's expression capabilities.

python/ray/data/namespace_expressions/map_namespace.py

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

python/ray/data/namespace_expressions/map_namespace.py

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

ryankert01 · 2026-01-12T16:55:36Z

PTAL @goutamvenkat-anyscale @owenowenisme

python/ray/data/namespace_expressions/map_namespace.py

owenowenisme

Minor fixes, overall LGTM

python/ray/data/namespace_expressions/map_namespace.py

Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: Ryan Huang <ryankert01@gmail.com>

Signed-off-by: Hsien-Cheng Huang <hcr@apache.org>

python/ray/data/namespace_expressions/map_namespace.py

goutamvenkat-anyscale · 2026-01-23T17:40:37Z

python/ray/data/namespace_expressions/map_namespace.py

+                f"(key and value), but got: {arr.type}."
+            )
+        return pyarrow.ListArray.from_arrays(
+            offsets=[0] * (len(arr) + 1),


let's use https://numpy.org/devdocs/reference/generated/numpy.repeat.html.

goutamvenkat-anyscale · 2026-01-23T17:40:41Z

python/ray/data/namespace_expressions/map_namespace.py

+        return pyarrow.ListArray.from_arrays(
+            offsets=[0] * (len(arr) + 1),
+            values=pyarrow.array([], type=pyarrow.null()),
+            mask=pyarrow.array([True] * len(arr)),


goutamvenkat-anyscale · 2026-01-23T17:54:53Z

python/ray/data/namespace_expressions/map_namespace.py

+    VALUES = "values"
+
+
+def _extract_map_component(


Let's create 3 helper functions to make the intent clearer.

_get_child_array which gets keys and values

_make_empty_list_array

_rebuild_list_array with normalized offsets

For each one add an example of what it's doing in the comments.

def _extract_map_component( arr: pyarrow.Array, component: MapComponent ) -> pyarrow.Array: """Extract keys or values from a MapArray or ListArray<Struct>.""" if isinstance(arr, pyarrow.ChunkedArray): return pyarrow.chunked_array( [_extract_map_component(chunk, component) for chunk in arr.chunks] ) child_array = _get_child_array(arr, component) if child_array is None: return _make_empty_list_array(arr, component) return _rebuild_list_array(arr, child_array)

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

goutamvenkat-anyscale · 2026-02-03T01:47:28Z

python/ray/data/tests/expressions/test_namespace_map.py

+    assert list(rows[0]["keys"]) == ["a"] and list(rows[0]["values"]) == [1]
+    assert len(rows[1]["keys"]) == 0 and len(rows[1]["values"]) == 0
+    assert rows[2]["keys"] is None and rows[2]["values"] is None


Let's use rows_same

row_same operates on pandas that can't handle the mixed None/list column when converting. The to_pandas() path triggers TensorArray casting which fails on the mixed types. Let's keep it!

Although there's workaround, but is too complex for the context of this test:

ctx = ray.data.context.DataContext.get_current() ctx.enable_tensor_extension_casting = False try: result = ( ds.with_column("keys", col("m").map.keys()) .with_column("values", col("m").map.values()) .to_pandas() ) expected = pd.DataFrame( { "keys": [["a"], [], None], "values": [[1], [], None], } ) _assert_result(result, expected, drop_cols=["m"]) finally: ctx.enable_tensor_extension_casting = True

python/ray/data/tests/expressions/test_namespace_map.py

goutamvenkat-anyscale · 2026-02-03T02:00:02Z

python/ray/data/namespace_expressions/map_namespace.py

+        if start_offset.as_py() != 0:
+            end_offset = offsets[-1].as_py()
+            child_array = child_array.slice(
+                offset=start_offset.as_py(), length=end_offset - start_offset.as_py()


Don't believe you need to call as_py here

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

python/ray/data/namespace_expressions/map_namespace.py

goutamvenkat-anyscale

Please look at open comments. Thanks

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

ryankert01 requested a review from a team as a code owner January 6, 2026 06:41

[Data] Add map namespace support for expression operations

1bd4269

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

ryankert01 force-pushed the map-expression branch from 694b035 to 1bd4269 Compare January 6, 2026 06:44

Merge branch 'master' into map-expression

2e157bd

cursor bot reviewed Jan 6, 2026

View reviewed changes

python/ray/data/namespace_expressions/map_namespace.py Show resolved Hide resolved

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

python/ray/data/namespace_expressions/map_namespace.py Outdated Show resolved Hide resolved

python/ray/data/namespace_expressions/map_namespace.py Outdated Show resolved Hide resolved

address ai review

68bef64

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Jan 6, 2026

cursor bot reviewed Jan 6, 2026

View reviewed changes

python/ray/data/namespace_expressions/map_namespace.py Outdated Show resolved Hide resolved

python/ray/data/namespace_expressions/map_namespace.py Show resolved Hide resolved

ryankert01 added 4 commits January 6, 2026 13:50

fix cursor bot suggestions

fe2642b

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

Merge branch 'master' into map-expression

843cac1

Merge remote-tracking branch 'origin/master' into map-expression

f16bfd1

refactor tests

df1fe8c

Signed-off-by: Hsien-Cheng Huang <ryankert01@gmail.com>

cursor bot reviewed Jan 12, 2026

View reviewed changes

python/ray/data/namespace_expressions/map_namespace.py Show resolved Hide resolved

Merge branch 'master' into map-expression

6461062

ryankert01 assigned goutamvenkat-anyscale Jan 14, 2026

ryankert01 and others added 2 commits January 19, 2026 00:54

Merge branch 'master' into map-expression

fcd3652

Merge branch 'master' into map-expression

202a652

owenowenisme reviewed Jan 21, 2026

View reviewed changes

python/ray/data/namespace_expressions/map_namespace.py Show resolved Hide resolved

python/ray/data/namespace_expressions/map_namespace.py Outdated Show resolved Hide resolved

ryankert01 and others added 3 commits January 22, 2026 19:57

Update python/ray/data/namespace_expressions/map_namespace.py

50a2e64

Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com> Signed-off-by: Ryan Huang <ryankert01@gmail.com>

address commits

e613cfa

Signed-off-by: Hsien-Cheng Huang <hcr@apache.org>

Merge branch 'master' into map-expression

70a3760

cursor bot reviewed Jan 22, 2026

View reviewed changes

python/ray/data/namespace_expressions/map_namespace.py Outdated Show resolved Hide resolved

goutamvenkat-anyscale reviewed Jan 23, 2026

View reviewed changes

ryankert01 and others added 2 commits January 25, 2026 13:12

Merge branch 'master' into map-expression

49268ec

create 3 helper functions to make the intent clearer

c390a24

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

ryankert01 added 3 commits January 25, 2026 23:28

use numpy.repeat()

5e024c8

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

text extractioon on empty chunkedArray

10e4b7c

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

Merge branch 'master' into map-expression

f9d53b8

ryankert01 requested review from goutamvenkat-anyscale and owenowenisme January 25, 2026 15:55

ryankert01 added 2 commits January 26, 2026 00:51

lint

978132e

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

Merge remote-tracking branch 'origin/map-expression' into map-expression

2eff519

goutamvenkat-anyscale reviewed Feb 3, 2026

View reviewed changes

python/ray/data/tests/expressions/test_namespace_map.py Show resolved Hide resolved

goutamvenkat-anyscale reviewed Feb 3, 2026

View reviewed changes

goutamvenkat-anyscale approved these changes Feb 3, 2026

View reviewed changes

iamjustinhsu added the go add ONLY when ready to merge, run all tests label Feb 4, 2026

Merge branch 'master' into map-expression

7a11478

cursor bot reviewed Feb 4, 2026

View reviewed changes

python/ray/data/namespace_expressions/map_namespace.py Outdated Show resolved Hide resolved

goutamvenkat-anyscale requested changes Feb 6, 2026

View reviewed changes

ryankert01 added 2 commits February 8, 2026 13:22

Merge branch 'master' into map-expression

dae4645

address comments

59f8047

Signed-off-by: Ryan Huang <ryankert01@gmail.com>

ryankert01 requested a review from goutamvenkat-anyscale February 8, 2026 06:53

Conversation

ryankert01 commented Jan 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

MapNamespace impl.

Testing

Related issues

Additional information

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryankert01 commented Jan 12, 2026

Uh oh!

Uh oh!

owenowenisme left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goutamvenkat-anyscale Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

goutamvenkat-anyscale Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

goutamvenkat-anyscale Jan 23, 2026 • edited by ryankert01 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goutamvenkat-anyscale Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

ryankert01 Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

goutamvenkat-anyscale Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

goutamvenkat-anyscale left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ryankert01 commented Jan 6, 2026 •

edited by cursor bot

Loading

`MapNamespace` impl.

owenowenisme left a comment •

edited

Loading

goutamvenkat-anyscale Jan 23, 2026 •

edited by ryankert01

Loading

ryankert01 Feb 8, 2026 •

edited

Loading