Pandas Checks adds .check methods to Pandas so you can inspect method chains without cutting them up.
As Fleetwood Mac says, you would never break the chain.
import pandas_checks
iris_processed = (
iris
.dropna()
.check.assert_positive(subset=["petal_length", "sepal_length"]) # 🐼🩺 Validate assumptions
.check.hist(column='petal_length') # 🐼🩺 Plot the distribution of a column after cleaning
.query("species=='setosa'")
.check.head(3) # 🐼🩺 Display the first few rows after more cleaning
.check.write("iris_processed.parquet") # 🐼🩺 Export the interim data, with type inferred from name
)The .check methods didn't modify how iris data got processed. That's the difference between .head() and .check.head().
💡 See the docs for details and configuration options.
# With uv
uv add pandas-checks
# Or with pip
pip install pandas-checksHere's what's in the doctor's bag.
General:
.check.assert_data()- Check that data passes an arbitrary condition, expressed as a lambda function - DataFrame | Series
Type assertions:
.check.assert_datetime()- DataFrame | Series.check.assert_float()- DataFrame | Series.check.assert_int()- DataFrame | Series.check.assert_str()- DataFrame | Series.check.assert_timedelta()- DataFrame | Series.check.assert_type()- DataFrame | Series
Value assertions:
.check.assert_all_nulls()- DataFrame | Series.check.assert_less_than()- DataFrame | Series.check.assert_greater_than()- DataFrame | Series.check.assert_negative()- DataFrame | Series.check.assert_no_nulls()- DataFrame | Series.check.assert_nrows()- DataFrame | Series.check.assert_positive()- DataFrame | Series.check.assert_same_nrows()- Confirm that the DataFrame/Series has the same number of rows as that of another DF/Series - DataFrame | Series.check.assert_unique()- DataFrame | Series
.check.columns()- DataFrame.check.describe()- DataFrame | Series.check.dtype()- Series.check.dtypes()- DataFrame.check.function()- Apply an arbitrary lambda function to your data and see the result - DataFrame | Series.check.head()- DataFrame | Series.check.info()- DataFrame | Series.check.memory_usage()- DataFrame | Series.check.ncols()- Count columns - DataFrame | Series.check.ndups()- Count rows with duplicate values - DataFrame | Series.check.nnulls()- Count rows with null values - DataFrame | Series.check.nrows()- Count rows - DataFrame | Series.check.nunique()- DataFrame | Series.check.print()- Print a string, a variable, or the current dataframe - DataFrame | Series.check.shape()- DataFrame | Series.check.tail()- DataFrame | Series.check.unique()- DataFrame | Series.check.value_counts()- DataFrame | Series
These methods can disable Pandas Checks methods, temporarily or permanently.
.check.disable_checks()- Don't run checks. By default, still runs assertions. - DataFrame | Series.check.enable_checks()- Run checks again. - DataFrame | Series
.check.print_time_elapsed(start_time)- Print the execution time since you calledstart_time = pdc.start_timer()- DataFrame | Series
💡 Tip: You can use this stopwatch anywhere in your Python code.
from pandas_checks import print_elapsed_time, start_timer start_time = start_timer() ... print_elapsed_time(start_time)
.check.hist()- A histogram - DataFrame | Series.check.plot()- An arbitrary plot you can customize - DataFrame | Series
If you run into trouble or have questions, I'd love to know. Please open an issue.
Contributions are appreciated! Please see more details.
Pandas Checks is licensed under the BSD-3 License.
🐼🩺

