Skip to content

Ibis has no way to convert UDFs to substrait plan #644

@Anindyadeep

Description

@Anindyadeep

Ibis is doing some incredible work by integrating substrait for generating substrait plan of the user's query to support cross DB operations in python.

Suppose we have a table like this :

┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ cust_id ┃ income1 ┃ income2 ┃ income3  ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ int64   │ float64 │ float64 │ float64  │
├─────────┼─────────┼─────────┼──────────┤
│       1 │ 20000.0 │ 3560.57 │      nan │
│       2 │ 34546.9 │ 6000.66 │   1000.0 │
│       3 │ 75430.2 │ 8111.01 │      nan │
│       4 │ 55430.2 │ 8111.01 │   1200.0 │
│       5 │     nan │ 8111.01 │      nan │
│       6 │     nan │     nan │ 100000.0 │
└─────────┴─────────┴─────────┴──────────┘

Right now we define udf's in ibis like this

import ibis.expr.datatypes as dt 
from ibis.backends.pandas.udf import udf

@udf.analytic(input_type = [dt.double, dt.double], output_type=dt.double)
def function(c1, c2):
    return c1 + c2 

And hence we can apply this function to our tables like this

function(table.income1, table.income2)

And applying this function returns this

┏━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ AnalyticVectorizedUDF() ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64                 │
├─────────────────────────┤
│                23560.57 │
│                40547.56 │
│                83541.21 │
│                63541.21 │
│                     nan │
│                     nan │
└─────────────────────────┘

Even we can mutate our existing table to add a new column with this function.

mutate_expression = table.mutate(
    added = function(table.income1, table.income2)
)

Before coming to the main problem, consider this, I have a simple expression like this

expression = table.income1 + table.income2

And now I can generate the substrait plan of this expression using this code :

from ibis_substrait.compiler.core import SubstraitCompiler

compiler = SubstraitCompiler()
expression = table.income1 + table.income2
substrait_plan = compiler.compile(table.mutate(expression))

Hence I can get the substrait plan. But when I am trying to get the substrait plan through an user defined function then I am getting this error:

udf_expression = table.mutate(
    added = function(table.income1, table.income2)
)

substrait_plan_udf = compiler.compile(table.mutate(udf_expression))

Doing this gives me the error : KeyError: 'AnalyticVectorizedUDF'.

I even thought that substrait might also not provide the support for now. But it seems like substrait do support :

  • UserDefined defined type
  • ParameterizedUserDefined type
  • UserDefined relation

But Not user defined relations.

This concludes that ibis is not supporting generating substrait plans for user defined functions. But it will be awesome if we have one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions