Skip to content

Commit abc8109

Browse files
authored
Merge pull request #678 from realpython/how-to-drop-null-values-in-pandas
how-to-drop-null-values-in-pandas
2 parents ff86f52 + 1a64778 commit abc8109

File tree

7 files changed

+106
-0
lines changed

7 files changed

+106
-0
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
The materials contained in this download are designed to complement the Real Python tutorial [How to Drop Null Values in pandas](https://realpython.com/how-to-drop-null-values-in-pandas/).
2+
3+
You should create a new folder named `pandas_nulls` on your computer and place each file inside it. You may also consider creating a [Python virtual environment](https://realpython.com/python-virtual-environments-a-primer/) within this folder.
4+
5+
Your download bundle contains the following four files. The first three files contain the code from different tutorial sections, while the fourth contains the solutions to the exercise.
6+
7+
`drop_null_rows.py`
8+
`drop_null_columns.py`
9+
`drop_a_subset.py`
10+
`exercise_solutions.py`
11+
12+
There are also two data files containing the data used throughout the tutorial:
13+
14+
`sales_data_with_missing_values.csv`
15+
`grades.csv`
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import pandas as pd
2+
3+
pd.set_option("display.max_columns", None)
4+
5+
sales_data = pd.read_csv(
6+
"sales_data_with_missing_values.csv",
7+
parse_dates=["order_date"],
8+
date_format="%d/%m/%Y",
9+
).convert_dtypes(dtype_backend="pyarrow")
10+
11+
12+
sales_data.dropna(axis=0, subset=(["discount", "sale_price"]))
13+
14+
sales_data.dropna(how="all")
15+
16+
sales_data.dropna(thresh=5)
17+
18+
sales_data.dropna(thresh=5, ignore_index=True)
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import pandas as pd
2+
3+
sales_data = pd.read_csv(
4+
"sales_data_with_missing_values.csv",
5+
parse_dates=["order_date"],
6+
date_format="%d/%m/%Y",
7+
).convert_dtypes(dtype_backend="pyarrow")
8+
9+
sales_data.dropna(axis="columns")
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
import pandas as pd
2+
3+
pd.set_option("display.max_columns", None)
4+
5+
sales_data = pd.read_csv(
6+
"sales_data_with_missing_values.csv",
7+
parse_dates=["order_date"],
8+
date_format="%d/%m/%Y",
9+
).convert_dtypes(dtype_backend="pyarrow")
10+
11+
sales_data
12+
13+
sales_data.isna().sum()
14+
15+
sales_data.dropna()
16+
17+
clean_sales_data = sales_data.dropna()
18+
19+
clean_sales_data = sales_data.dropna(inplace=True)
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
import pandas as pd
2+
3+
grades = pd.read_csv(
4+
"grades.csv",
5+
).convert_dtypes(dtype_backend="pyarrow")
6+
7+
# 1. Use `.dropna()` in such a way that it permanently drops the row in the dataframe containing only null values.
8+
9+
grades.dropna(how="all", inplace=True)
10+
11+
# 2. Display the rows for the exams that all students have completed.
12+
13+
grades.dropna()
14+
15+
# 3. Display any columns with no missing data.
16+
17+
grades.dropna(axis=1)
18+
19+
# 4. Display the exams sat by at least five students.
20+
21+
grades.dropna(axis=0, thresh=6) # Remember there are seven columns.
22+
23+
# 5. Who else was in in every exam that both S2 and S4 sat?
24+
25+
grades.dropna(subset=["S2", "S4"]).dropna(axis=1, ignore_index=True)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Subject,S1,S2,S3,S4,S5,S6
2+
math,18,,15,20,17,18
3+
science,26,35,19,,33,
4+
art,15,,9,17,18,14
5+
music,14,20,12,20,13,18
6+
history,18,19,,17,,18
7+
sport,20,17,20,17,18
8+
,,,,,
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
order_number,order_date,customer_name,product_purchased,discount,sale_price
2+
,09/02/2025,Skipton Fealty,Chili Extra Virgin Olive Oil,TRUE,135.00
3+
70041,,Carmine Priestnall,,,150.00
4+
70042,09/02/2025,,Rosemary Olive Oil Candle,FALSE,78.00
5+
70043,10/02/2025,Lanni D'Ambrogi,,TRUE,19.50
6+
70044,10/02/2025,Tann Angear,Vanilla and Olive Oil Candle,,13.98
7+
70045,10/02/2025,Skipton Fealty,Basil Extra Virgin Olive Oil,TRUE,
8+
70046,11/02/2025,Far Pow,Chili Extra Virgin Olive Oil,FALSE,150.00
9+
70047,11/02/2025,Hill Group,Chili Extra Virgin Olive Oil,TRUE,135.00
10+
70048,11/02/2025,Devlin Nock,Lavender and Olive Oil Lotion,FALSE,39.96
11+
,,,,,
12+
70049,12/02/2025,Swift Inc,Garlic Extra Virgin Olive Oil,TRUE,936.00

0 commit comments

Comments
 (0)