-
Notifications
You must be signed in to change notification settings - Fork 134
1064/enhancement/add functions to Expr class #1074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I wrote this script to help produce this PR. I wanted to start with just the functions which have a single Expr input that return an Expr. The script writes all those defs to a file and then I copied that over to the source and let the linter fix the bad formatting that my code created. from inspect import signature, Parameter
from datafusion import functions as f, Expr
from types import FunctionType
from pathlib import Path
funcs_not_exprs = set(dir(f)) - set(dir(Expr))
funcs = []
for fun in funcs_not_exprs:
if isinstance(getattr(f, fun), FunctionType):
funcs.append(fun)
expr_in_out = {
"one_in_out": [],
"multi_expr": [],
"other_ins": [],
"other_return": [],
"other": [],
}
for fun in funcs:
sig = signature(getattr(f, fun))
params = sig.parameters
return_annotation = sig.return_annotation
if return_annotation != "Expr":
expr_in_out["other_return"].append(fun)
continue
all_expr = True
no_star = True
for name, param in params.items():
if param.annotation != "Expr":
all_expr = False
break
if param.kind in (Parameter.VAR_POSITIONAL, Parameter.KEYWORD_ONLY):
no_star=False
if len(params) == 1 and all_expr and no_star:
expr_in_out["one_in_out"].append(fun)
elif len(params) > 1 and all_expr:
expr_in_out["multi_expr"].append(fun)
elif len(params) > 1 and not all_expr:
expr_in_out["other_ins"].append(fun)
else:
expr_in_out["other"].append(fun)
expr_defs = Path("./expr_defs.py")
with expr_defs.open("w") as ff:
for fun in expr_in_out["one_in_out"]:
ff.write(f" def {fun}(self) -> Expr:\n")
docstring = getattr(f, fun).__doc__
if docstring is not None:
ff.write(' """')
docstring = docstring.strip()
ff.write(docstring)
ff.write('\n """\n')
ff.write(f" return F.{fun}(self)\n")Before I do tests for all of them, I wanted to put this in the world for feedback. One additional idea that wasn't in the original PR would be to create namespaces to group the category of function so instead of That, of course, requires some more manual effort to categorize the functions. As another forward thought, when it comes to functions that take extra Expr inputs like def levenshtein(self, string2: Expr|str) -> Expr:
if isinstance(string2, str):
string2=col(string2)
return F.levenshtein(self, string2)That would be consistent with polars wrt to literals, so if someone wanted the levenshtein against a literal they'd have to use For functions that take a number as the second then it'd use that directly, such as: def pow(self, exponent: Expr | int | float) -> Expr:
if isinstance(exponent, (int, float)):
exponent=lit(exponent)
return f.pow(self, exponent) |
|
I do love this PR. I hadn't looked at it since it's in draft, but I fully endorse. |
|
Let me know when it's ready for review |
|
I'll mark it ready and just do a second PR for more methods. |
|
@timsaucer did you have any thoughts on the namespaces because going from this to that would wind up being a breaking change. |
|
At a high level I like some of the aspects of a namespace. It would especially be nice to clean up our documentation using them in |
|
I must have missed that CI didn't run on this. We have a circular import and broke CI on |
Which issue does this PR close?
Works towards closing #1064
Rationale for this change
To improve ergonomics of the API by adding functions to the Expr class so that they can be chained
What changes are included in this PR?
This includes function definitions in the Expr class for all the single argument functions that take Expr and return Expr
Are there any user-facing changes?
Yes, users can use functions chained from other Exprs for example
col("a").abs().tan()No breaking changes.