Skip to content

cparmet/pandas-checks

Repository files navigation

Pandas Checks

PyPI - Python Version

Banner image for Pandas Checks

Pandas Checks adds .check methods to Pandas so you can inspect method chains without cutting them up.

As Fleetwood Mac says, you would never break the chain.

import pandas_checks

iris_processed = (
    iris
    .dropna()
    .check.assert_positive(subset=["petal_length", "sepal_length"]) # 🐼🩺 Validate assumptions
    .check.hist(column='petal_length') # 🐼🩺 Plot the distribution of a column after cleaning

    .query("species=='setosa'")
    .check.head(3)  # 🐼🩺 Display the first few rows after more cleaning
    .check.write("iris_processed.parquet") # 🐼🩺 Export the interim data, with type inferred from name
)

Sample output



The .check methods didn't modify how iris data got processed. That's the difference between .head() and .check.head().

Table of Contents

💡 See the docs for details and configuration options.

Installation

# With uv
uv add pandas-checks

# Or with pip
pip install pandas-checks

.check methods

Here's what's in the doctor's bag.

Assertions

General:

  • .check.assert_data() - Check that data passes an arbitrary condition, expressed as a lambda function - DataFrame | Series

Type assertions:

Value assertions:

Describe data

Disable Pandas Checks

These methods can disable Pandas Checks methods, temporarily or permanently.

  • .check.disable_checks() - Don't run checks. By default, still runs assertions. - DataFrame | Series
  • .check.enable_checks() - Run checks again. - DataFrame | Series

Export interim files

  • .check.write() - Export the current data, inferring file format from the name - DataFrame | Series

Time your code

  • .check.print_time_elapsed(start_time) - Print the execution time since you called start_time = pdc.start_timer() - DataFrame | Series

💡 Tip: You can use this stopwatch anywhere in your Python code.

from pandas_checks import print_elapsed_time, start_timer

start_time = start_timer()
...
print_elapsed_time(start_time)

Visualize data

Giving feedback and contributing

If you run into trouble or have questions, I'd love to know. Please open an issue.

Contributions are appreciated! Please see more details.

License

Pandas Checks is licensed under the BSD-3 License.

🐼🩺

About

🐼🩺 Pandas Checks: Non-invasive health checks for Pandas method chains

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages