Releases: unionai-oss/pandera
Release 0.29.0: support list, dict, and tuple of dataframes
⭐️ Highlight
Pandera now supports collection types containing dataframes, shoutout to @garethellis0 with an amazing first contribution!
@pa.check_types
def process_tuple_and_return_dict(
dfs: tuple[DataFrame[OnlyZeroesSchema], DataFrame[OnlyOnesSchema]],
) -> dict[str, DataFrame[OnlyZeroesSchema]]:
return {
"foo": dfs[0],
"bar": dfs[0]
}
result = process_tuple_and_return_dict((
pd.DataFrame({"a": [0, 0]}),
pd.DataFrame({"a": [1, 1]}),
))
print(result)What's Changed
- feature/1078: Added Support For List, Dict, And Tuples Of Dataframes by @garethellis0 in #2204
- pin sphinx version by @cosmicBboy in #2208
- Add map datatype to the Ibis engine implementation by @deepyaman in #2206
New Contributors
- @garethellis0 made their first contribution in #2204
Full Changelog: v0.28.1...v0.29.0
v0.28.1: Fix regressions in Check behavior
What's Changed
- fix bugs in Check interface and Field by @cosmicBboy in #2203
Full Changelog: v0.28.0...v0.28.1
Release 0.28.0: Add support for Pyspark 4
⭐️ Highlight
Pandera now supports Pyspark 4 🚀
What's Changed
- refactor(pyspark): restructure pyspark components by @ELC in #2007
- add support for pyspark 4 by @cosmicBboy in #2193
- Decouple import dependencies for io serialization formats by @cosmicBboy in #2195
- Use
get_annotationsinstead of direct__annotations__access by @amerberg in #2196 - Re-implement improvements to str_length check by @cosmicBboy in #2198
- Support the
Decimaldata type in the Ibis engine by @deepyaman in #2194 - Update .git-blame-ignore-revs to add Ruff refactor by @deepyaman in #2199
- Avoid full materialization of levels in failing MultiIndex validations by @amerberg in #2187
- schema descriptor should raise AttributeError if build_schema_ is not implemented by @amerberg in #2197
New Contributors
Full Changelog: v0.27.1...v0.28.0
Release v0.27.1: bugfix related to numpy==2.4.0
What's Changed
- enhancement #2122 by @Jarek-Rolski in #2177
- Fix failure_cases index value for MultiIndex schema errors by @amerberg in #2186
- handle new numpy 2.4.0 ValueError when type is not recognized by @cosmicBboy in #2191
Full Changelog: v0.27.0...v0.27.1
v0.27.0: Support Python 3.14
⭐️ Highlight
Pandera now supports Python 3.14! We also dropped support for Python 3.9
What's Changed
scipy-stubsby @jorenham in #2121- bugfix: set
SPARK_LOCAL_IPto127.0.0.1if not set. by @cosmicBboy in #2123 - fix: collect
failure_casesincheck_column_values_are_uniqueby @MikeEvansLarah in #2120 - Adding import to code example in data_synthesis_strategies.md by @OwenLund in #2126
- Pin DuckDB<1.4.0 in dev env due to breaking change by @deepyaman in #2140
- fix mypy polars issues by @cosmicBboy in #2142
- Add descriptors to DataFrameModel by @lundybernard in #2136
- Support nonnullable equivalents for all data types by @deepyaman in #2146
- Fix failure case count for Ibis tables and strings by @deepyaman in #2145
- Implement
check_nullablechecks for Ibis backend by @deepyaman in #2149 - feat: create empty dataframe with index (and multiindex) when present… by @davidkleiven in #2133
- Bugfix/1994 Error loading frictionless schema by @Jarek-Rolski in #2159
- Do not pass removed
nameargument tomemtables by @deepyaman in #2162 - Fix: Add enum.Enum serialization support for to_json() by @chris-wright-nl in #2163
- Implement
drop_invalid_rowsfor the Ibis backend by @deepyaman in #2151 - Support Python 3.14 by @glatterf42 in #2158
- optimize pandas MultiIndex validation by avoiding materializing level values when possible by @amerberg in #2118
- fix: remove pandas.concat signature hook by @kitagry in #2173
- add codecov token to ci by @cosmicBboy in #2175
- Use Ruff instead of Black,
pyupgradeandisortby @deepyaman in #2171
New Contributors
- @jorenham made their first contribution in #2121
- @MikeEvansLarah made their first contribution in #2120
- @OwenLund made their first contribution in #2126
- @chris-wright-nl made their first contribution in #2163
- @glatterf42 made their first contribution in #2158
- @kitagry made their first contribution in #2173
Full Changelog: v0.26.1...v0.27.0
v0.27.0b0: beta release, add Python 3.14
What's Changed
scipy-stubsby @jorenham in #2121- bugfix: set
SPARK_LOCAL_IPto127.0.0.1if not set. by @cosmicBboy in #2123 - fix: collect
failure_casesincheck_column_values_are_uniqueby @MikeEvansLarah in #2120 - Adding import to code example in data_synthesis_strategies.md by @OwenLund in #2126
- Pin DuckDB<1.4.0 in dev env due to breaking change by @deepyaman in #2140
- fix mypy polars issues by @cosmicBboy in #2142
- Add descriptors to DataFrameModel by @lundybernard in #2136
- Support nonnullable equivalents for all data types by @deepyaman in #2146
- Fix failure case count for Ibis tables and strings by @deepyaman in #2145
- Implement
check_nullablechecks for Ibis backend by @deepyaman in #2149 - feat: create empty dataframe with index (and multiindex) when present… by @davidkleiven in #2133
- Bugfix/1994 Error loading frictionless schema by @Jarek-Rolski in #2159
- Do not pass removed
nameargument tomemtables by @deepyaman in #2162 - Fix: Add enum.Enum serialization support for to_json() by @chris-wright-nl in #2163
- Implement
drop_invalid_rowsfor the Ibis backend by @deepyaman in #2151 - Support Python 3.14 by @glatterf42 in #2158
- optimize pandas MultiIndex validation by avoiding materializing level values when possible by @amerberg in #2118
- fix: remove pandas.concat signature hook by @kitagry in #2173
New Contributors
- @jorenham made their first contribution in #2121
- @MikeEvansLarah made their first contribution in #2120
- @OwenLund made their first contribution in #2126
- @chris-wright-nl made their first contribution in #2163
- @glatterf42 made their first contribution in #2158
- @kitagry made their first contribution in #2173
Full Changelog: v0.26.1...v0.27.0b0
v0.26.1: Multi-index, `@check_types` Bugfixes
What's Changed
- fix MultiIndex check regression by @amerberg in #2116
- implement multiindex_strict and multiindex_unique add test cases by @amerberg in #2114
- Bugfix: #2058 Check_types for callable by @ybressler in #2069
New Contributors
- @ybressler made their first contribution in #2069
Full Changelog: v0.26.0...v0.26.1
v0.26.0: Add support for Python 3.13
⭐️ Highlight
📣 Pandera now supports Python 3.13! Now go forth and use bare forward reference types to your hearts content 🤗
What's Changed
- Enh/future annotations py3.13 by @cosmicBboy in #1980
- fix pyspark check registration by @cosmicBboy in #2087
- remove top-level pandera init import warning by @cosmicBboy in #2088
- Bugfix 2075: Polar dataframe default values - fill_nan AND fill_null for float columns by @cmsommerville in #2076
- Remove pylint by @cosmicBboy in #2086
- Upgrade
pyupgradehook and target Python version by @deepyaman in #2093 - Fix passing an empty column list to check duplicates by @rush4ratio in #2092
- Replace
Literalimports fromtyping_extensionsby @deepyaman in #2100 - Add
.git-blame-ignore-revsto avoid bulk changes by @deepyaman in #2101 - limit polars version on Mac OS by @amerberg in #2105
- delete monthly downloads, not available by @cosmicBboy in #2112
- Implement parser machinery and the
strictparser by @deepyaman in #2096 - Support checking joint uniqueness of table columns by @deepyaman in #2097
- Reimplement pandas MultiIndex backend without inheriting from DataFrame backend by @amerberg in #2103
- fix(doc): clarify check_fn signature by @Farley-Chen in #2107
- Fix missing tests core directory by @rush4ratio in #2102
- fix polars Categorical bug by @cosmicBboy in #2113
New Contributors
- @cmsommerville made their first contribution in #2076
- @rush4ratio made their first contribution in #2092
- @Farley-Chen made their first contribution in #2107
Full Changelog: v0.25.0...v0.26.0
v0.25.0: 🦩 Support Ibis table validation
⭐️ Highlight
Pandera now supports Ibis 🦩! You can now validate data on all available ibis backends using the pandera.ibis module.
In-memory table example:
import ibis
import pandera.ibis as pa
class Schema(pa.DataFrameModel):
state: str
city: str
price: int = pa.Field(in_range={"min_value": 5, "max_value": 20})
t = ibis.memtable(
{
'state': ['FL','FL','FL','CA','CA','CA'],
'city': [
'Orlando',
'Miami',
'Tampa',
'San Francisco',
'Los Angeles',
'San Diego',
],
'price': [8, 12, 10, 16, 20, 18],
}
)
Schema.validate(t).execute()Sqlite example:
con = ibis.sqlite.connect()
t = con.create_table(
"table",
schema=ibis.schema(dict(state="string", city="string", price="int64"))
)
con.insert(
"table",
obj=[
("FL", "Orlando", 8),
("FL", "Miami", 12),
("FL", "Tampa", 10),
("CA", "San Francisco", 16),
("CA", "Los Angeles", 20),
("CA", "San Diego", 18),
]
)
Schema.validate(t).execute()What does this mean?
This release unlocks in database validation in some of the most widely used data platforms, including PostGres, Snowflake, BigQuery, MySQL, and more ✨. It means that you can validate data at scale, on your database/data framework of your choice, before fetching it for downstream analysis/modeling work.
Naturally, this also means that you can develop your schemas locally on a duckdb or sqlite backend and then use the same schemas in production on a remote database like postgres.
Learn more about the integration here.
What's Changed
- Add Polars pydantic integration with format support and native JSON schema generation by @halicki in #1979
- exclude python 3.12 and pyspark combo in ci by @cosmicBboy in #2005
- Delete previously-added foo.txt and new_example.py by @deepyaman in #2013
- Pin PySpark due to test failures/incompatibilities by @deepyaman in #2010
- Temporarily pin
polarsdue to test failure in CI by @deepyaman in #2011 - Replace
event_loopremoved in pytest-asyncio 1.0 by @deepyaman in #2014 - Fix typehint in unique_values_eq (issue #1492) by @AhmetZamanis in #2015
- fix pyarrow string issue, fix docs failing issues by @cosmicBboy in #2026
- bugfix: PANDERA_VALIDATION_ENABLED=False should disable validation by @cosmicBboy in #2028
- Expect Python slice index errors after Python 3.10 by @deepyaman in #2033
- Ibis dev by @deepyaman in #2040
- handle dataframe-level failure cases: convert row to dict by @cosmicBboy in #2050
- bugfix/1927 by @Jarek-Rolski in #2019
- [🐻❄️ polars] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2055
- [🦩 ibis] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2056
- Add link to the documentation about Ibis datatypes by @deepyaman in #2057
- Test column presence, mark other features not impl by @deepyaman in #2060
- Run
pre-commiton all files to fix linter issues by @deepyaman in #2063 - Implement
regexoption and add additional checks by @deepyaman in #2061 - Implement binary and boolean types (and test them) by @deepyaman in #2064
- Add unit test suite for Ibis components, fix a bug by @deepyaman in #2065
- bugfix: fix
format_vectorized_error_messageto properly format nested pyarrow failed cases by @AndrejIring in #2036 - handle empty dataframes with PydanticModel: show warning by @cosmicBboy in #2066
- bugfix/2031: Allow strict='filter' and coerce='True' at the same time for PySpark schemas by @gfilaci in #2032
- Set validation scope for pandas run_checks methods by @amerberg in #2003
- DataFrameSchema.update_index correctly sets title, description, and metadata by @cosmicBboy in #2067
- [ibis 🦩] remove inplace=True in column validate call by @cosmicBboy in #2068
- [ibis 🦩] check backend: use positional join for duckdb and polars, fix ibis DataFrameModel.validate types by @cosmicBboy in #2071
New Contributors
- @halicki made their first contribution in #1979
- @AhmetZamanis made their first contribution in #2015
- @AndrejIring made their first contribution in #2036
- @gfilaci made their first contribution in #2032
- @amerberg made their first contribution in #2003
Full Changelog: v0.24.0...v0.25.0
v0.25.0rc0: Support ibis table validation
What's Changed
- Add Polars pydantic integration with format support and native JSON schema generation by @halicki in #1979
- exclude python 3.12 and pyspark combo in ci by @cosmicBboy in #2005
- Delete previously-added foo.txt and new_example.py by @deepyaman in #2013
- Pin PySpark due to test failures/incompatibilities by @deepyaman in #2010
- Temporarily pin
polarsdue to test failure in CI by @deepyaman in #2011 - Replace
event_loopremoved in pytest-asyncio 1.0 by @deepyaman in #2014 - Fix typehint in unique_values_eq (issue #1492) by @AhmetZamanis in #2015
- fix pyarrow string issue, fix docs failing issues by @cosmicBboy in #2026
- bugfix: PANDERA_VALIDATION_ENABLED=False should disable validation by @cosmicBboy in #2028
- Expect Python slice index errors after Python 3.10 by @deepyaman in #2033
- Ibis dev by @deepyaman in #2040
- handle dataframe-level failure cases: convert row to dict by @cosmicBboy in #2050
- bugfix/1927 by @Jarek-Rolski in #2019
- [🐻❄️ polars] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2055
- [🦩 ibis] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2056
- Add link to the documentation about Ibis datatypes by @deepyaman in #2057
- Test column presence, mark other features not impl by @deepyaman in #2060
- Run
pre-commiton all files to fix linter issues by @deepyaman in #2063 - Implement
regexoption and add additional checks by @deepyaman in #2061 - Implement binary and boolean types (and test them) by @deepyaman in #2064
- Add unit test suite for Ibis components, fix a bug by @deepyaman in #2065
- bugfix: fix
format_vectorized_error_messageto properly format nested pyarrow failed cases by @AndrejIring in #2036 - handle empty dataframes with PydanticModel: show warning by @cosmicBboy in #2066
- bugfix/2031: Allow strict='filter' and coerce='True' at the same time for PySpark schemas by @gfilaci in #2032
- Set validation scope for pandas run_checks methods by @amerberg in #2003
- DataFrameSchema.update_index correctly sets title, description, and metadata by @cosmicBboy in #2067
- [ibis 🦩] remove inplace=True in column validate call by @cosmicBboy in #2068
New Contributors
- @halicki made their first contribution in #1979
- @AhmetZamanis made their first contribution in #2015
- @AndrejIring made their first contribution in #2036
- @gfilaci made their first contribution in #2032
- @amerberg made their first contribution in #2003
Full Changelog: v0.24.0...v0.25.0rc0