-
Notifications
You must be signed in to change notification settings - Fork 203
Description
We have encountered an issue with the export and validation of Databricks physical types when upgrading from version 0.10.x to 0.11.x. The types that are written into the data contract YAML format have changed, and this change affects the validation process.
In version 0.11.x, the following changes in type representation were noted:
- string is now represented as StringType()
- integer is now represented as IntegerType()
- etc.
While validation checks confirm the presence of fields, they incorrectly report their types as None, without actual validation:
Validation Results:
│ passed │ Check that field 'string_test_1' is present
│ passed │ Check that field string_test_1 has type None
│ passed │ Check that field 'bool_test' is present
│ passed │ Check that field bool_test has type None
│ passed │ Check that field 'date_test_1' is present
│ passed │ Check that field date_test_1 has type None
│ passed │ Check that field 'num_test_1' is present
│ passed │ Check that field num_test_1 has type None
Expected Behavior:
Databricks datatypes should be exported similar to how 0.10.x. OR: Validation should recognize the updated types and validate the fields accordingly rather than defaulting to None.
Attachments:
Detailed testing results: datacontract-databricks-datatypes-export-and-test.xlsx
Code Snippet for Data Contract Generation:
from datacontract.data_contract import DataContract
data_contract_specification = DataContract().import_from_source("spark", 'abc.def.table')
data_contract = DataContract(data_contract=data_contract_specification, spark=spark)
contract_yaml = yaml.safe_load(data_contract.export("odcs").replace("\xa0", " "))