Skip to content

Improve schema projection performance for Iceberg tables #28787

@louischao9167

Description

@louischao9167

Trino version

446.1.198

Please describe the bug

Let's say we have an iceberg table like this,

CREATE TABLE
  db_1.tbl_1 (
    col_1 string,
    col_2 string,
    col_3 string
  )

If we have a query in trino,
SELECT col_1 FROM db_1.tbl_1 WHERE col_2 = 'my data'

The current schema projection for this query will include all of columns (col_1, col_2, col_3) regardless of the exact projection and filtering in the query.

The expected result of schema projection based on the query above should be col_1, col_2

How to reproduce:

Write a simple query in TestIcebergSplitSource.java

@Test
    public void testMyQuery()
    {
        assertUpdate("CREATE TABLE employees (id BIGINT, name VARCHAR, contacts ARRAY(ROW(email VARCHAR, phone VARCHAR, city VARCHAR)), dt VARCHAR) WITH (partitioning = ARRAY['dt'])");
        assertThat(computeActual("SELECT contacts[1].email FROM employees where dt = '123'").getRowCount()).isEqualTo(2);
    }

Check how table is scanned in getNextBatch in IcebergSplitSource.java

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    icebergIceberg connector

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions