Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions python/datafusion/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -409,16 +409,32 @@
def drop(self, *columns: str) -> DataFrame:
"""Drop arbitrary amount of columns.

Column names are case-sensitive and do not require double quotes like
other operations such as `select`. Leading and trailing double quotes
are allowed and will be automatically stripped if present.

Args:

Check failure on line 416 in python/datafusion/dataframe.py

View workflow job for this annotation

GitHub Actions / build

Ruff (W291)

python/datafusion/dataframe.py:416:78: W291 Trailing whitespace
columns: Column names to drop from the dataframe.
columns: Column names to drop from the dataframe. Both 'column_name'

Check failure on line 417 in python/datafusion/dataframe.py

View workflow job for this annotation

GitHub Actions / build

Ruff (W291)

python/datafusion/dataframe.py:417:78: W291 Trailing whitespace
and '"column_name"' are accepted.

Returns:
DataFrame with those columns removed in the projection.

Check failure on line 421 in python/datafusion/dataframe.py

View workflow job for this annotation

GitHub Actions / build

Ruff (W291)

python/datafusion/dataframe.py:421:81: W291 Trailing whitespace

Example Usage:
df.drop('ID_For_Students') # Works
df.drop('"ID_For_Students"') # Also works (quotes stripped)
"""

Check failure on line 426 in python/datafusion/dataframe.py

View workflow job for this annotation

GitHub Actions / build

Ruff (W293)

python/datafusion/dataframe.py:426:1: W293 Blank line contains whitespace
return DataFrame(self.df.drop(*columns))
normalized_columns = []
for col in columns:
if col.startswith('"') and col.endswith('"'):
normalized_columns.append(col.strip('"')) # Removes quotes from both sides of col
else:
normalized_columns.append(col)

return DataFrame(self.df.drop(*normalized_columns))

Check failure on line 434 in python/datafusion/dataframe.py

View workflow job for this annotation

GitHub Actions / build

Ruff (E501)

python/datafusion/dataframe.py:434:89: E501 Line too long (97 > 88)

def filter(self, *predicates: Expr) -> DataFrame:
"""Return a DataFrame for which ``predicate`` evaluates to ``True``.

Check failure on line 437 in python/datafusion/dataframe.py

View workflow job for this annotation

GitHub Actions / build

Ruff (W293)

python/datafusion/dataframe.py:437:1: W293 Blank line contains whitespace

Rows for which ``predicate`` evaluates to ``False`` or ``None`` are filtered
out. If more than one predicate is provided, these predicates will be
Expand Down
11 changes: 10 additions & 1 deletion python/tests/test_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,10 +216,19 @@
assert result.column(0) == pa.array([4, 5, 6])
assert result.column(1) == pa.array([1, 2, 3])


def test_drop_quoted_columns():
ctx = SessionContext()
batch = pa.RecordBatch.from_arrays([pa.array([1, 2, 3])], names=["ID_For_Students"])
df = ctx.create_dataframe([[batch]])

# Both should work
assert df.drop('"ID_For_Students"').schema().names == []
assert df.drop('ID_For_Students').schema().names == []

Check failure on line 226 in python/tests/test_dataframe.py

View workflow job for this annotation

GitHub Actions / build

Ruff (W293)

python/tests/test_dataframe.py:226:1: W293 Blank line contains whitespace


def test_select_mixed_expr_string(df):

Check failure on line 229 in python/tests/test_dataframe.py

View workflow job for this annotation

GitHub Actions / build

Ruff (Q000)

python/tests/test_dataframe.py:229:20: Q000 Single quotes found but double quotes preferred
df = df.select(column("b"), "a")

Check failure on line 231 in python/tests/test_dataframe.py

View workflow job for this annotation

GitHub Actions / build

Ruff (W293)

python/tests/test_dataframe.py:231:1: W293 Blank line contains whitespace
# execute and collect the first (and only) batch
result = df.collect()[0]

Expand Down
Loading