Skip to content

Releases: unionai-oss/pandera

Release 0.29.0: support list, dict, and tuple of dataframes

29 Jan 02:48
7614754

Choose a tag to compare

⭐️ Highlight

Pandera now supports collection types containing dataframes, shoutout to @garethellis0 with an amazing first contribution!

@pa.check_types
def process_tuple_and_return_dict(
    dfs: tuple[DataFrame[OnlyZeroesSchema], DataFrame[OnlyOnesSchema]],
) -> dict[str, DataFrame[OnlyZeroesSchema]]:
    return {
        "foo": dfs[0],
        "bar": dfs[0]
    }


result = process_tuple_and_return_dict((
    pd.DataFrame({"a": [0, 0]}),
    pd.DataFrame({"a": [1, 1]}),
))
print(result)

What's Changed

New Contributors

Full Changelog: v0.28.1...v0.29.0

v0.28.1: Fix regressions in Check behavior

08 Jan 14:10
71f860a

Choose a tag to compare

What's Changed

Full Changelog: v0.28.0...v0.28.1

Release 0.28.0: Add support for Pyspark 4

06 Jan 20:37
82096dd

Choose a tag to compare

⭐️ Highlight

Pandera now supports Pyspark 4 🚀

What's Changed

  • refactor(pyspark): restructure pyspark components by @ELC in #2007
  • add support for pyspark 4 by @cosmicBboy in #2193
  • Decouple import dependencies for io serialization formats by @cosmicBboy in #2195
  • Use get_annotations instead of direct __annotations__ access by @amerberg in #2196
  • Re-implement improvements to str_length check by @cosmicBboy in #2198
  • Support the Decimal data type in the Ibis engine by @deepyaman in #2194
  • Update .git-blame-ignore-revs to add Ruff refactor by @deepyaman in #2199
  • Avoid full materialization of levels in failing MultiIndex validations by @amerberg in #2187
  • schema descriptor should raise AttributeError if build_schema_ is not implemented by @amerberg in #2197

New Contributors

  • @ELC made their first contribution in #2007

Full Changelog: v0.27.1...v0.28.0

Release v0.27.1: bugfix related to numpy==2.4.0

22 Dec 19:01
70abc5c

Choose a tag to compare

What's Changed

Full Changelog: v0.27.0...v0.27.1

v0.27.0: Support Python 3.14

25 Nov 16:11
ff8674a

Choose a tag to compare

⭐️ Highlight

Pandera now supports Python 3.14! We also dropped support for Python 3.9

What's Changed

New Contributors

Full Changelog: v0.26.1...v0.27.0

v0.27.0b0: beta release, add Python 3.14

23 Nov 13:51
b48e0e3

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.26.1...v0.27.0b0

v0.26.1: Multi-index, `@check_types` Bugfixes

26 Aug 16:48
f8384ae

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.26.0...v0.26.1

v0.26.0: Add support for Python 3.13

13 Aug 01:12
24fe938

Choose a tag to compare

⭐️ Highlight

📣 Pandera now supports Python 3.13! Now go forth and use bare forward reference types to your hearts content 🤗

What's Changed

New Contributors

Full Changelog: v0.25.0...v0.26.0

v0.25.0: 🦩 Support Ibis table validation

08 Jul 19:19
c49b18f

Choose a tag to compare

⭐️ Highlight

Pandera now supports Ibis 🦩! You can now validate data on all available ibis backends using the pandera.ibis module.

In-memory table example:

import ibis
import pandera.ibis as pa

class Schema(pa.DataFrameModel):
    state: str
    city: str
    price: int = pa.Field(in_range={"min_value": 5, "max_value": 20})

t = ibis.memtable(
    {
        'state': ['FL','FL','FL','CA','CA','CA'],
        'city': [
            'Orlando',
            'Miami',
            'Tampa',
            'San Francisco',
            'Los Angeles',
            'San Diego',
        ],
        'price': [8, 12, 10, 16, 20, 18],
    }
)
Schema.validate(t).execute()

Sqlite example:

con = ibis.sqlite.connect()
t = con.create_table(
    "table",
    schema=ibis.schema(dict(state="string", city="string", price="int64"))
)

con.insert(
    "table",
    obj=[
        ("FL", "Orlando", 8),
        ("FL", "Miami", 12),
        ("FL", "Tampa", 10),
        ("CA", "San Francisco", 16),
        ("CA", "Los Angeles", 20),
        ("CA", "San Diego", 18),
    ]
)

Schema.validate(t).execute()

What does this mean?

This release unlocks in database validation in some of the most widely used data platforms, including PostGres, Snowflake, BigQuery, MySQL, and more ✨. It means that you can validate data at scale, on your database/data framework of your choice, before fetching it for downstream analysis/modeling work.

Naturally, this also means that you can develop your schemas locally on a duckdb or sqlite backend and then use the same schemas in production on a remote database like postgres.

Learn more about the integration here.

What's Changed

  • Add Polars pydantic integration with format support and native JSON schema generation by @halicki in #1979
  • exclude python 3.12 and pyspark combo in ci by @cosmicBboy in #2005
  • Delete previously-added foo.txt and new_example.py by @deepyaman in #2013
  • Pin PySpark due to test failures/incompatibilities by @deepyaman in #2010
  • Temporarily pin polars due to test failure in CI by @deepyaman in #2011
  • Replace event_loop removed in pytest-asyncio 1.0 by @deepyaman in #2014
  • Fix typehint in unique_values_eq (issue #1492) by @AhmetZamanis in #2015
  • fix pyarrow string issue, fix docs failing issues by @cosmicBboy in #2026
  • bugfix: PANDERA_VALIDATION_ENABLED=False should disable validation by @cosmicBboy in #2028
  • Expect Python slice index errors after Python 3.10 by @deepyaman in #2033
  • Ibis dev by @deepyaman in #2040
  • handle dataframe-level failure cases: convert row to dict by @cosmicBboy in #2050
  • bugfix/1927 by @Jarek-Rolski in #2019
  • [🐻‍❄️ polars] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2055
  • [🦩 ibis] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2056
  • Add link to the documentation about Ibis datatypes by @deepyaman in #2057
  • Test column presence, mark other features not impl by @deepyaman in #2060
  • Run pre-commit on all files to fix linter issues by @deepyaman in #2063
  • Implement regex option and add additional checks by @deepyaman in #2061
  • Implement binary and boolean types (and test them) by @deepyaman in #2064
  • Add unit test suite for Ibis components, fix a bug by @deepyaman in #2065
  • bugfix: fix format_vectorized_error_message to properly format nested pyarrow failed cases by @AndrejIring in #2036
  • handle empty dataframes with PydanticModel: show warning by @cosmicBboy in #2066
  • bugfix/2031: Allow strict='filter' and coerce='True' at the same time for PySpark schemas by @gfilaci in #2032
  • Set validation scope for pandas run_checks methods by @amerberg in #2003
  • DataFrameSchema.update_index correctly sets title, description, and metadata by @cosmicBboy in #2067
  • [ibis 🦩] remove inplace=True in column validate call by @cosmicBboy in #2068
  • [ibis 🦩] check backend: use positional join for duckdb and polars, fix ibis DataFrameModel.validate types by @cosmicBboy in #2071

New Contributors

Full Changelog: v0.24.0...v0.25.0

v0.25.0rc0: Support ibis table validation

07 Jul 00:34
ad8f08d

Choose a tag to compare

What's Changed

  • Add Polars pydantic integration with format support and native JSON schema generation by @halicki in #1979
  • exclude python 3.12 and pyspark combo in ci by @cosmicBboy in #2005
  • Delete previously-added foo.txt and new_example.py by @deepyaman in #2013
  • Pin PySpark due to test failures/incompatibilities by @deepyaman in #2010
  • Temporarily pin polars due to test failure in CI by @deepyaman in #2011
  • Replace event_loop removed in pytest-asyncio 1.0 by @deepyaman in #2014
  • Fix typehint in unique_values_eq (issue #1492) by @AhmetZamanis in #2015
  • fix pyarrow string issue, fix docs failing issues by @cosmicBboy in #2026
  • bugfix: PANDERA_VALIDATION_ENABLED=False should disable validation by @cosmicBboy in #2028
  • Expect Python slice index errors after Python 3.10 by @deepyaman in #2033
  • Ibis dev by @deepyaman in #2040
  • handle dataframe-level failure cases: convert row to dict by @cosmicBboy in #2050
  • bugfix/1927 by @Jarek-Rolski in #2019
  • [🐻‍❄️ polars] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2055
  • [🦩 ibis] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2056
  • Add link to the documentation about Ibis datatypes by @deepyaman in #2057
  • Test column presence, mark other features not impl by @deepyaman in #2060
  • Run pre-commit on all files to fix linter issues by @deepyaman in #2063
  • Implement regex option and add additional checks by @deepyaman in #2061
  • Implement binary and boolean types (and test them) by @deepyaman in #2064
  • Add unit test suite for Ibis components, fix a bug by @deepyaman in #2065
  • bugfix: fix format_vectorized_error_message to properly format nested pyarrow failed cases by @AndrejIring in #2036
  • handle empty dataframes with PydanticModel: show warning by @cosmicBboy in #2066
  • bugfix/2031: Allow strict='filter' and coerce='True' at the same time for PySpark schemas by @gfilaci in #2032
  • Set validation scope for pandas run_checks methods by @amerberg in #2003
  • DataFrameSchema.update_index correctly sets title, description, and metadata by @cosmicBboy in #2067
  • [ibis 🦩] remove inplace=True in column validate call by @cosmicBboy in #2068

New Contributors

Full Changelog: v0.24.0...v0.25.0rc0