-
Notifications
You must be signed in to change notification settings - Fork 131
feat: allow DataFrame.filter to accept SQL strings #1276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if @timsaucer agrees, can we expand the scope from filter and include other similar methods which are not to hard to implement, i think join_on has expression |
That being said, I am not at all opposed to evaluating other places in |
I missed that important case
@K-dash would you be interested in investigating ? |
FWIW I did a quick test with this: --- a/python/datafusion/dataframe.py
+++ b/python/datafusion/dataframe.py
@@ -424,7 +424,9 @@ class DataFrame:
df = df.select("a", col("b"), col("a").alias("alternate_a"))
"""
- exprs_internal = expr_list_to_raw_expr_list(exprs)
+ expr_list = [self.parse_sql_expr(e) if isinstance(e, str) else e for e in exprs]
+
+ exprs_internal = expr_list_to_raw_expr_list(expr_list)
return DataFrame(self.df.select(*exprs_internal)) With that you can do |
Thanks for sharing the snippet—being able to call |
should we roll back df.select_expr and do this instead @timsaucer , it makes sense to me to do it |
Yes, but no. The problem with that snippet is that I think it will fail for people (like me) who have column names that are not sql parseable. They should still work as turning into a column expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense but lets wait for @timsaucer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @K-dash and @milenkovicm !
Which issue does this PR close?
Closes #1273
Rationale for this change
Users have requested Spark-like support for
DataFrame.filter("a > 1")
so they can reuse existing SQL predicate strings without converting them to expression objects.What changes are included in this PR?
DataFrame.filter
to normalize SQL string predicates viaparse_sql_expr
before dispatching to the internal API.Are there any user-facing changes?
DataFrame.filter
now accepts SQL string predicates in addition toExpr
objects, and the documentation reflects this capability. No breaking API changes.