Skip to content

Issue with Databricks Type Export and Validation in CLI Version 0.11.x #1048

@IchEssBlumen

Description

@IchEssBlumen

We have encountered an issue with the export and validation of Databricks physical types when upgrading from version 0.10.x to 0.11.x. The types that are written into the data contract YAML format have changed, and this change affects the validation process.

In version 0.11.x, the following changes in type representation were noted:

  1. string is now represented as StringType()
  2. integer is now represented as IntegerType()
  3. etc.

While validation checks confirm the presence of fields, they incorrectly report their types as None, without actual validation:

Validation Results:
│ passed │ Check that field 'string_test_1' is present
│ passed │ Check that field string_test_1 has type None
│ passed │ Check that field 'bool_test' is present
│ passed │ Check that field bool_test has type None
│ passed │ Check that field 'date_test_1' is present
│ passed │ Check that field date_test_1 has type None
│ passed │ Check that field 'num_test_1' is present
│ passed │ Check that field num_test_1 has type None

Expected Behavior:
Databricks datatypes should be exported similar to how 0.10.x. OR: Validation should recognize the updated types and validate the fields accordingly rather than defaulting to None.

Attachments:
Detailed testing results: datacontract-databricks-datatypes-export-and-test.xlsx

Code Snippet for Data Contract Generation:
from datacontract.data_contract import DataContract
data_contract_specification = DataContract().import_from_source("spark", 'abc.def.table')
data_contract = DataContract(data_contract=data_contract_specification, spark=spark)
contract_yaml = yaml.safe_load(data_contract.export("odcs").replace("\xa0", " "))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions