Skip to content

Support aggregating by ListView #19782

@brancz

Description

@brancz

Is your feature request related to a problem or challenge?

It's currently not possible to aggregate by RunArrays.

SQL Logic Test that uses `ListView` in a group by
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at

#   http://www.apache.org/licenses/LICENSE-2.0

# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

#############
## ListView Aggregation Tests
#############

### Setup: Create test tables with ListView arrays

statement ok
CREATE TABLE list_view_agg_test AS
SELECT
    id,
    group_col,
    arrow_cast(make_array(val1, val2, val3), 'ListView(Int64)') as list_view_col
FROM (VALUES
    (1, 'A', 10, 20, 30),
    (2, 'A', 40, 50, 60),
    (3, 'B', 70, 80, 90),
    (4, 'B', 100, 110, 120),
    (5, 'C', 1, 2, 3)
) AS t(id, group_col, val1, val2, val3);

### Test: GROUP BY on ListView column

query ?I rowsort
SELECT list_view_col, COUNT(*) FROM list_view_agg_test GROUP BY list_view_col;
----
[1, 2, 3] 1
[10, 20, 30] 1
[40, 50, 60] 1
[70, 80, 90] 1
[100, 110, 120] 1

### Cleanup

statement ok
DROP TABLE list_view_agg_test;

The current error it returns is:

1. query failed: DataFusion error: Arrow error: Not yet implemented: Row format support not yet implemented for: [SortField { options: SortOptions { descending: false, nulls_first: true }, data_type: ListView(Field { data_type: Int64, null
able: true }) }]
[SQL] SELECT list_view_col, COUNT(*) FROM list_view_agg_test GROUP BY list_view_col;
at /Users/brancz/src/github.com/apache/datafusion/datafusion/sqllogictest/test_files/list_view_aggregation.slt:40

Which sounds like the first thing to tackle is to add row format support in arrow-row, but I expect that at the very least we'll also need support in DataFusion's hash utils.

Describe the solution you'd like

Support doing this.

Describe alternatives you've considered

Casting from ListView to List, but that defeats the purpose of doing as little copies as possible.

Additional context

You currently need to run the SLT on top of #19355 since arrow-rs 57.2 implemented a number of features for ListView.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions