Skip to content

Expand use of sql parsing string expressions in DataFrame #1278

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

This is a follow on to #1273

There are a number of places where it would be convenient to pass SQL strings as expressions. For example, it would be nice to do

df.select(
    "a",
    "a - b",
    col("c"),
)

This should intuitively know that we are getting column a, followed by col("a") - col("b") followed by column 'c'.

Describe the solution you'd like

Using the sql parsing on the DataFrame make the following functions handle SQL strings. We must be very careful that we do not break things like cases where users have a column name that is not SQL parseable.

Describe alternatives you've considered

Status quo

Additional context

DataFrame functions to update:

  • select
  • remove select_exprs
  • with_column
  • with_columns
  • aggregate
  • repartition_by_hash

We do not want to apply this treatment to joins because there is no easy way to know which DataFrame to perform the SQL parsing against.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions