-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Ibis is doing some incredible work by integrating substrait for generating substrait plan of the user's query to support cross DB operations in python.
Suppose we have a table like this :
┏━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ cust_id ┃ income1 ┃ income2 ┃ income3 ┃
┡━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ int64 │ float64 │ float64 │ float64 │
├─────────┼─────────┼─────────┼──────────┤
│ 1 │ 20000.0 │ 3560.57 │ nan │
│ 2 │ 34546.9 │ 6000.66 │ 1000.0 │
│ 3 │ 75430.2 │ 8111.01 │ nan │
│ 4 │ 55430.2 │ 8111.01 │ 1200.0 │
│ 5 │ nan │ 8111.01 │ nan │
│ 6 │ nan │ nan │ 100000.0 │
└─────────┴─────────┴─────────┴──────────┘
Right now we define udf's in ibis like this
import ibis.expr.datatypes as dt
from ibis.backends.pandas.udf import udf
@udf.analytic(input_type = [dt.double, dt.double], output_type=dt.double)
def function(c1, c2):
return c1 + c2
And hence we can apply this function to our tables like this
function(table.income1, table.income2)
And applying this function returns this
┏━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ AnalyticVectorizedUDF() ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ float64 │
├─────────────────────────┤
│ 23560.57 │
│ 40547.56 │
│ 83541.21 │
│ 63541.21 │
│ nan │
│ nan │
└─────────────────────────┘
Even we can mutate our existing table to add a new column with this function.
mutate_expression = table.mutate(
added = function(table.income1, table.income2)
)
Before coming to the main problem, consider this, I have a simple expression like this
expression = table.income1 + table.income2
And now I can generate the substrait plan of this expression using this code :
from ibis_substrait.compiler.core import SubstraitCompiler
compiler = SubstraitCompiler()
expression = table.income1 + table.income2
substrait_plan = compiler.compile(table.mutate(expression))
Hence I can get the substrait plan. But when I am trying to get the substrait plan through an user defined function then I am getting this error:
udf_expression = table.mutate(
added = function(table.income1, table.income2)
)
substrait_plan_udf = compiler.compile(table.mutate(udf_expression))
Doing this gives me the error : KeyError: 'AnalyticVectorizedUDF'
.
I even thought that substrait might also not provide the support for now. But it seems like substrait do support :
- UserDefined defined
type
- ParameterizedUserDefined
type
- UserDefined
relation
But Not user defined relations.
This concludes that ibis is not supporting generating substrait plans for user defined functions. But it will be awesome if we have one.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status