-
Notifications
You must be signed in to change notification settings - Fork 203
Open
Description
For CSV and Parquet files that are obtained over DuckDB (S3, GCS, Azure and local environment), the check field_is_present always passes, independent of whether the column is present in the CSV or not.
Cause: In create_view_with_schema_union(...), we create an empty table based on the datacontract and later insert the actual data, e.g. from a CSV file. If a column is not present in the CSV file, it will still be in the checked table (but filled with NULLs).
Possible solutions:
- Remove the
field_is_presentcheck (for non-required fields). However, it is useful to check for field presence as discussed in Support for historical data validation causes error in CSV files without headers #1018. - Fix the check and enforce field presence in the header even for non-required fields. This would break, e.g.,
test_csv_optional_field_missing_from_old_data()intest_test_schema_evolution.py, which assumes that a CSV-file that misses a non-required field should pass all tests.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels