Skip to content

Inconsistent Pandas DataFrame validation with DataFrameSchema with both strict and ordered set to True. #2213

@jonasboecquaert

Description

@jonasboecquaert

Describe the bug
When validating a DataFrame (which includes an extra column and two columns out of order) against a DataFrameSchema that has both strict and ordered set to True, you will only get a COLUMN_NOT_ORDERED schema error. I would also expect a COLUMN_NOT_IN_SCHEMA error due to the DataFrame having an extra unexpected column.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

schema = pa.DataFrameSchema(
        columns={
              "id": pa.Column(pa.Int64, nullable=False),
              "name": pa.Column(pa.String, nullable=True),
        },
        strict=True,
        ordered=True,
)

# This dataframe will incorrectly only raise a COLUMN_NOT_ORDERED schema error even tho it also contains an extra unexpected column.
df = pd.DataFrame(
        {
              "name": ["Alice", "Bob", "Charlie"],
              "id": [1, 2, 3],
              "extra_column": ["extra1", "extra2", "extra3"],
        },
)

# This dataframe will correctly only raise the COLUMN_NOT_IN_SCHEMA schema error
_df = pd.DataFrame(
        {
              "id": [1, 2, 3],
              "name": ["Alice", "Bob", "Charlie"],
              "extra_column": ["extra1", "extra2", "extra3"],
        },
)

Expected behavior

When my DataFrameSchema has both strict and ordered set to True, I'd expect the validation to raise both COLUMN_NOT_ORDERED and COLUMN_NOT_IN_SCHEMA schema errors, when the validated DataFrame contains both issues.

Desktop (please complete the following information):

  • OS: iOS
  • Browser: N/A
  • Version: [e.g. 22]

Screenshots

N/A

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions