diff --git a/README.md b/README.md index c5757c684..efb6f6c43 100644 --- a/README.md +++ b/README.md @@ -9,8 +9,8 @@

The `datacontract` CLI is an open-source command-line tool for working with [data contracts](https://datacontract.com). -It natively supports the [Open Data Contract Standard](https://bitol-io.github.io/open-data-contract-standard/latest/) to lint data contracts, connect to data sources and execute schema and quality tests, and export to different formats. -The tool is written in Python. +It natively supports the [Open Data Contract Standard](https://bitol-io.github.io/open-data-contract-standard/latest/) to lint data contracts, connect to data sources and execute schema and quality tests, and export to different formats. +The tool is written in Python. It can be used as a standalone CLI tool, in a CI/CD pipeline, or directly as a Python library. ![Main features of the Data Contract CLI](datacontractcli.png) @@ -129,7 +129,7 @@ $ datacontract import --format sql --source my-ddl.sql --dialect postgres --outp # import from Excel template $ datacontract import --format excel --source odcs.xlsx --output odcs.yaml -# export to Excel template +# export to Excel template $ datacontract export --format excel --output odcs.xlsx odcs.yaml ``` @@ -266,12 +266,12 @@ Commands ### init ``` - - Usage: datacontract init [OPTIONS] [LOCATION] - - Create an empty data contract. - - + + Usage: datacontract init [OPTIONS] [LOCATION] + + Create an empty data contract. + + ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮ │ location [LOCATION] The location of the data contract file to create. │ │ [default: datacontract.yaml] │ @@ -288,12 +288,12 @@ Commands ### lint ``` - - Usage: datacontract lint [OPTIONS] [LOCATION] - - Validate that the datacontract.yaml is correctly formatted. - - + + Usage: datacontract lint [OPTIONS] [LOCATION] + + Validate that the datacontract.yaml is correctly formatted. + + ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮ │ location [LOCATION] The location (url or path) of the data contract yaml. │ │ [default: datacontract.yaml] │ @@ -316,12 +316,12 @@ Commands ### test ``` - - Usage: datacontract test [OPTIONS] [LOCATION] - - Run schema and quality tests on configured servers. - - + + Usage: datacontract test [OPTIONS] [LOCATION] + + Run schema and quality tests on configured servers. + + ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮ │ location [LOCATION] The location (url or path) of the data contract yaml. │ │ [default: datacontract.yaml] │ @@ -926,7 +926,7 @@ models: #### API -Data Contract CLI can test APIs that return data in JSON format. +Data Contract CLI can test APIs that return data in JSON format. Currently, only GET requests are supported. ##### Example @@ -943,9 +943,9 @@ models: my_object: # corresponds to the root element of the JSON response type: object fields: - field1: + field1: type: string - fields2: + fields2: type: number ``` @@ -982,13 +982,13 @@ models: ### export ``` - - Usage: datacontract export [OPTIONS] [LOCATION] - - Convert data contract to a specific format. Saves to file specified by `output` option if present, - otherwise prints to stdout. - - + + Usage: datacontract export [OPTIONS] [LOCATION] + + Convert data contract to a specific format. Saves to file specified by `output` option if present, + otherwise prints to stdout. + + ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮ │ location [LOCATION] The location (url or path) of the data contract yaml. │ │ [default: datacontract.yaml] │ @@ -1230,7 +1230,7 @@ to limit your contract export to a single model. ```bash $ datacontract export --format iceberg --model orders https://datacontract.com/examples/orders-latest/datacontract.yaml --output /tmp/orders_iceberg.json - + $ cat /tmp/orders_iceberg.json | jq '.' { "type": "struct", @@ -1287,67 +1287,59 @@ The export function converts the data contract specification into the custom for datacontract export --format custom --template template.txt datacontract.yaml ``` -##### Jinja variables +##### Jinja templates & variables You can directly use the Data Contract Specification as template variables. ```shell $ cat template.txt title: {{ data_contract.info.title }} +models: +{%- for model_name, model in data_contract.models.items() %} + - name: {{ model.name }} +{%- endfor %} $ datacontract export --format custom --template template.txt datacontract.yaml title: Orders Latest ``` -##### Example Jinja Templates +##### Example Jinja Templates for a customized dbt model -###### Customized dbt model +You can export a given dbt model containing any logic by adding the `schema-name` filter/parameter -You can export the dbt models containing any logic. +It adds jinja variable passed to your template.file: +- `schema_name`: str +- `schema`: SchemaObject from ODCS Below is an example of a dbt staging layer that converts a field of `type: timestamp` to a `DATETIME` type with time zone conversion. -template.sql - -{% raw %} -```sql -{%- for model_name, model in data_contract.models.items() %} -{#- Export only the first model #} -{%- if loop.first -%} -SELECT -{%- for field_name, field in model.fields.items() %} - {%- if field.type == "timestamp" %} - DATETIME({{ field_name }}, "Asia/Tokyo") AS {{ field_name }}, - {%- else %} - {{ field_name }} AS {{ field_name }}, - {%- endif %} -{%- endfor %} -FROM - {{ "{{" }} ref('{{ model_name }}') {{ "}}" }} -{%- endif %} -{%- endfor %} -``` -{% endraw %} - -command - -```shell -datacontract export --format custom --template template.sql --output output.sql datacontract.yaml -``` - -output.sql - -```sql -SELECT - order_id AS order_id, - DATETIME(order_timestamp, "Asia/Tokyo") AS order_timestamp, - order_total AS order_total, - customer_id AS customer_id, - customer_email_address AS customer_email_address, - DATETIME(processed_timestamp, "Asia/Tokyo") AS processed_timestamp, -FROM - {{ ref('orders') }} -``` +- `template.sql` + ```sql + SELECT + {%- for field in schema.properties %} + {%- if field.physicalType == "timestamp" %} + DATETIME({{ field.name }}, "Asia/Tokyo") AS {{ field.name }}, + {%- else %} + {{ field.name }} AS {{ field.name }}, + {%- endif %} + {%- endfor %} + FROM {{ "{{" }} ref('{{ schema_name }}') {{ "}}" }} + ``` +- export command + ```shell + datacontract export datacontract.odcs.yaml --format custom --template template.sql --schema-name orders + ``` +- `output.sql` + ```sql + SELECT + order_id AS order_id, + DATETIME(order_timestamp, "Asia/Tokyo") AS order_timestamp, + order_total AS order_total, + customer_id AS customer_id, + customer_email_address AS customer_email_address, + DATETIME(processed_timestamp, "Asia/Tokyo") AS processed_timestamp, + FROM {{ ref('orders') }} + ``` #### ODCS Excel Template @@ -1367,13 +1359,13 @@ For more information about the Excel template structure, visit the [ODCS Excel T ### import ``` - - Usage: datacontract import [OPTIONS] - - Create a data contract from the given source location. Saves to file specified by `output` option - if present, otherwise prints to stdout. - - + + Usage: datacontract import [OPTIONS] + + Create a data contract from the given source location. Saves to file specified by `output` option + if present, otherwise prints to stdout. + + ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮ │ * --format [sql|avro|dbt|dbml|glue| The format of the source │ │ jsonschema|json|bigquery file. │ @@ -1598,7 +1590,7 @@ DataContract.import_from_source("spark", "users", dataframe = df_user) DataContract.import_from_source(format = "spark", source = "users", dataframe = df_user) # Example: Import Spark table + table description -DataContract.import_from_source("spark", "users", description = "description") +DataContract.import_from_source("spark", "users", description = "description") DataContract.import_from_source(format = "spark", source = "users", description = "description") # Example: Import Spark dataframe + table description @@ -1612,7 +1604,7 @@ Importing from DBML Documents. **NOTE:** Since DBML does _not_ have strict requirements on the types of columns, this import _may_ create non-valid datacontracts, as not all types of fields can be properly mapped. In this case you will have to adapt the generated document manually. We also assume, that the description for models and fields is stored in a Note within the DBML model. -You may give the `dbml-table` or `dbml-schema` parameter to enumerate the tables or schemas that should be imported. +You may give the `dbml-table` or `dbml-schema` parameter to enumerate the tables or schemas that should be imported. If no tables are given, _all_ available tables of the source will be imported. Likewise, if no schema is given, _all_ schemas are imported. Examples: @@ -1659,7 +1651,7 @@ datacontract import --format csv --source "test.csv" #### protobuf -Importing from protobuf File. Specify file in `source` parameter. +Importing from protobuf File. Specify file in `source` parameter. Example: @@ -1670,12 +1662,12 @@ datacontract import --format protobuf --source "test.proto" ### catalog ``` - - Usage: datacontract catalog [OPTIONS] - - Create a html catalog of data contracts. - - + + Usage: datacontract catalog [OPTIONS] + + Create a html catalog of data contracts. + + ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮ │ --files TEXT Glob pattern for the data contract files to include in the │ │ catalog. Applies recursively to any subfolders. │ @@ -1701,12 +1693,12 @@ datacontract catalog --files "*.odcs.yaml" ### publish ``` - - Usage: datacontract publish [OPTIONS] [LOCATION] - - Publish the data contract to the Entropy Data. - - + + Usage: datacontract publish [OPTIONS] [LOCATION] + + Publish the data contract to the Entropy Data. + + ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────╮ │ location [LOCATION] The location (url or path) of the data contract yaml. │ │ [default: datacontract.yaml] │ @@ -1726,22 +1718,22 @@ datacontract catalog --files "*.odcs.yaml" ### api ``` - - Usage: datacontract api [OPTIONS] - - Start the datacontract CLI as server application with REST API. - - The OpenAPI documentation as Swagger UI is available on http://localhost:4242. You can execute the - commands directly from the Swagger UI. - To protect the API, you can set the environment variable DATACONTRACT_CLI_API_KEY to a secret API - key. To authenticate, requests must include the header 'x-api-key' with the correct API key. This - is highly recommended, as data contract tests may be subject to SQL injections or leak sensitive - information. - To connect to servers (such as a Snowflake data source), set the credentials as environment - variables as documented in https://cli.datacontract.com/#test - It is possible to run the API with extra arguments for `uvicorn.run()` as keyword arguments, e.g.: - `datacontract api --port 1234 --root_path /datacontract`. - + + Usage: datacontract api [OPTIONS] + + Start the datacontract CLI as server application with REST API. + + The OpenAPI documentation as Swagger UI is available on http://localhost:4242. You can execute the + commands directly from the Swagger UI. + To protect the API, you can set the environment variable DATACONTRACT_CLI_API_KEY to a secret API + key. To authenticate, requests must include the header 'x-api-key' with the correct API key. This + is highly recommended, as data contract tests may be subject to SQL injections or leak sensitive + information. + To connect to servers (such as a Snowflake data source), set the credentials as environment + variables as documented in https://cli.datacontract.com/#test + It is possible to run the API with extra arguments for `uvicorn.run()` as keyword arguments, e.g.: + `datacontract api --port 1234 --root_path /datacontract`. + ╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮ │ --port INTEGER Bind socket to this port. [default: 4242] │ │ --host TEXT Bind socket to this host. Hint: For running in docker, set it │ @@ -2016,7 +2008,7 @@ uv run pytest Run in wsl. (We need to fix the paths in the tests so that normal Windows will work, contributions are appreciated) -#### PyCharm does not pick up the `.venv` +#### PyCharm does not pick up the `.venv` This [uv issue](https://github.com/astral-sh/uv/issues/12545) might be relevant. diff --git a/datacontract/export/custom_exporter.py b/datacontract/export/custom_exporter.py index bc8a03c25..993d9b28c 100644 --- a/datacontract/export/custom_exporter.py +++ b/datacontract/export/custom_exporter.py @@ -1,9 +1,9 @@ from pathlib import Path from jinja2 import Environment, FileSystemLoader -from open_data_contract_standard.model import OpenDataContractStandard +from open_data_contract_standard.model import OpenDataContractStandard, SchemaObject -from datacontract.export.exporter import Exporter +from datacontract.export.exporter import Exporter, _check_schema_name_for_export class CustomExporter(Exporter): @@ -22,16 +22,28 @@ def export( if template is None: raise RuntimeError("Export to custom requires template argument.") - return to_custom(data_contract, template) + if schema_name and schema_name != "all": + schema_name, model_obj = _check_schema_name_for_export(data_contract, schema_name, self.export_format) + return to_custom(data_contract, template, schema_name=schema_name, schema=model_obj) + else: + return to_custom(data_contract, template) -def to_custom(data_contract: OpenDataContractStandard, template_path: Path) -> str: +def to_custom( + data_contract: OpenDataContractStandard, + template_path: Path, + schema_name: str | None = None, + schema: SchemaObject | None = None, +) -> str: template = get_template(template_path) - rendered_sql = template.render(data_contract=data_contract) - return rendered_sql + context = {"data_contract": data_contract} + if schema is not None: + context["schema"] = schema + context["schema_name"] = schema_name + return template.render(**context) def get_template(path: Path): - abosolute_path = Path(path).resolve() - env = Environment(loader=FileSystemLoader(str(abosolute_path.parent))) + absolute_path = Path(path).resolve() + env = Environment(loader=FileSystemLoader(str(absolute_path.parent))) return env.get_template(path.name) diff --git a/tests/fixtures/custom/export_model/datacontract.odcs.yaml b/tests/fixtures/custom/export_model/datacontract.odcs.yaml new file mode 100644 index 000000000..a354b3393 --- /dev/null +++ b/tests/fixtures/custom/export_model/datacontract.odcs.yaml @@ -0,0 +1,86 @@ +kind: DataContract +apiVersion: v3.1.0 +id: orders-unit-test +name: Orders Unit Test +version: 1.0.0 +status: active +description: + purpose: The orders data contract +team: + name: checkout + description: Checkout team +servers: + - server: production + type: bigquery + environment: production + account: my-account + project: my-database + dataset: my-schema +schema: + - name: orders + businessName: orders + physicalType: table + description: The orders model + properties: + - name: order_id + businessName: Order ID + logicalType: string + physicalType: varchar + unique: true + required: true + classification: sensitive + tags: + - order_id + logicalTypeOptions: + minLength: 8 + maxLength: 10 + pattern: ^B[0-9]+$ + customProperties: + - property: pii + value: "true" + examples: + - B12345678 + - B12345679 + - name: order_total + logicalType: integer + physicalType: bigint + required: true + description: The order_total field + logicalTypeOptions: + minimum: 0 + maximum: 1000000 + quality: + - type: sql + description: 95% of all order total values are expected to be between 10 and 499 EUR. + query: | + SELECT quantile_cont(order_total, 0.95) AS percentile_95 + FROM orders + mustBeBetween: + - 1000 + - 49900 + - name: order_status + logicalType: string + physicalType: text + required: true + customProperties: + - property: enum + value: "[\"pending\", \"shipped\", \"delivered\"]" + - name: user_id + businessName: User ID + logicalType: string + physicalType: varchar + required: true + relationships: + - type: foreignKey + to: users.user_id + - name: users + businessName: users + physicalType: table + description: The users model + properties: + - name: user_id + businessName: User ID + logicalType: string + physicalType: varchar + unique: true + required: true \ No newline at end of file diff --git a/tests/fixtures/custom/export_model/datacontract.yaml b/tests/fixtures/custom/export_model/datacontract.yaml new file mode 100644 index 000000000..62c347ab8 --- /dev/null +++ b/tests/fixtures/custom/export_model/datacontract.yaml @@ -0,0 +1,216 @@ +dataContractSpecification: 1.2.1 +id: urn:datacontract:checkout:orders-latest +info: + title: Orders Latest + version: 2.0.0 + description: | + Successful customer orders in the webshop. + All orders since 2020-01-01. + Orders with their line items are in their current state (no history included). + owner: Checkout Team + contact: + name: John Doe (Data Product Owner) + url: https://teams.microsoft.com/l/channel/example/checkout +servers: + production: + type: s3 + environment: prod + location: s3://datacontract-example-orders-latest/v2/{model}/*.json + format: json + delimiter: new_line + description: "One folder per model. One file per day." + roles: + - name: analyst_us + description: Access to the data for US region + - name: analyst_cn + description: Access to the data for China region +terms: + usage: | + Data can be used for reports, analytics and machine learning use cases. + Order may be linked and joined by other tables + limitations: | + Not suitable for real-time use cases. + Data may not be used to identify individual customers. + Max data processing per day: 10 TiB + policies: + - name: privacy-policy + url: https://example.com/privacy-policy + - name: license + description: External data is licensed under agreement 1234. + url: https://example.com/license/1234 + billing: 5000 USD per month + noticePeriod: P3M +models: + orders: + description: One record per order. Includes cancelled and deleted orders. + type: table + fields: + order_id: + $ref: "#/definitions/order_id" + required: true + unique: true + primaryKey: true + order_timestamp: + description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful. + type: timestamp + required: true + examples: + - "2024-09-09T08:30:00Z" + tags: ["business-timestamp"] + order_total: + description: Total amount the smallest monetary unit (e.g., cents). + type: long + required: true + examples: + - 9999 + quality: + - type: sql + description: 95% of all order total values are expected to be between 10 and 499 EUR. + query: | + SELECT quantile_cont(order_total, 0.95) AS percentile_95 + FROM orders + mustBeBetween: [1000, 49900] + customer_id: + description: Unique identifier for the customer. + type: text + minLength: 10 + maxLength: 20 + customer_email_address: + description: The email address, as entered by the customer. + type: text + format: email + required: true + pii: true + classification: sensitive + quality: + - type: text + description: The email address is not verified and may be invalid. + lineage: + inputFields: + - namespace: com.example.service.checkout + name: checkout_db.orders + field: email_address + processed_timestamp: + description: The timestamp when the record was processed by the data platform. + type: timestamp + required: true + config: + jsonType: string + jsonFormat: date-time + quality: + - type: sql + description: The maximum duration between two orders should be less that 3600 seconds + query: | + SELECT MAX(duration) AS max_duration FROM (SELECT EXTRACT(EPOCH FROM (order_timestamp - LAG(order_timestamp) + OVER (ORDER BY order_timestamp))) AS duration FROM orders) + mustBeLessThan: 3600 + - type: sql + description: Row Count + query: | + SELECT count(*) as row_count + FROM orders + mustBeGreaterThan: 5 + examples: + - | + order_id,order_timestamp,order_total,customer_id,customer_email_address,processed_timestamp + "1001","2030-09-09T08:30:00Z",2500,"1000000001","mary.taylor82@example.com","2030-09-09T08:31:00Z" + "1002","2030-09-08T15:45:00Z",1800,"1000000002","michael.miller83@example.com","2030-09-09T08:31:00Z" + "1003","2030-09-07T12:15:00Z",3200,"1000000003","michael.smith5@example.com","2030-09-09T08:31:00Z" + "1004","2030-09-06T19:20:00Z",1500,"1000000004","elizabeth.moore80@example.com","2030-09-09T08:31:00Z" + "1005","2030-09-05T10:10:00Z",4200,"1000000004","elizabeth.moore80@example.com","2030-09-09T08:31:00Z" + "1006","2030-09-04T14:55:00Z",2800,"1000000005","john.davis28@example.com","2030-09-09T08:31:00Z" + "1007","2030-09-03T21:05:00Z",1900,"1000000006","linda.brown67@example.com","2030-09-09T08:31:00Z" + "1008","2030-09-02T17:40:00Z",3600,"1000000007","patricia.smith40@example.com","2030-09-09T08:31:00Z" + "1009","2030-09-01T09:25:00Z",3100,"1000000008","linda.wilson43@example.com","2030-09-09T08:31:00Z" + "1010","2030-08-31T22:50:00Z",2700,"1000000009","mary.smith98@example.com","2030-09-09T08:31:00Z" + line_items: + description: A single article that is part of an order. + type: table + fields: + line_item_id: + type: text + description: Primary key of the lines_item_id table + required: true + order_id: + $ref: "#/definitions/order_id" + references: orders.order_id + sku: + description: The purchased article number + $ref: "#/definitions/sku" + primaryKey: ["order_id", "line_item_id"] + examples: + - | + line_item_id,order_id,sku + "LI-1","1001","5901234123457" + "LI-2","1001","4001234567890" + "LI-3","1002","5901234123457" + "LI-4","1002","2001234567893" + "LI-5","1003","4001234567890" + "LI-6","1003","5001234567892" + "LI-7","1004","5901234123457" + "LI-8","1005","2001234567893" + "LI-9","1005","5001234567892" + "LI-10","1005","6001234567891" +definitions: + order_id: + title: Order ID + type: text + format: uuid + description: An internal ID that identifies an order in the online shop. + examples: + - 243c25e5-a081-43a9-aeab-6d5d5b6cb5e2 + pii: true + classification: restricted + tags: + - orders + sku: + title: Stock Keeping Unit + type: text + pattern: ^[A-Za-z0-9]{8,14}$ + examples: + - "96385074" + description: | + A Stock Keeping Unit (SKU) is an internal unique identifier for an article. + It is typically associated with an article's barcode, such as the EAN/GTIN. + links: + wikipedia: https://en.wikipedia.org/wiki/Stock_keeping_unit + tags: + - inventory +servicelevels: + availability: + description: The server is available during support hours + percentage: 99.9% + retention: + description: Data is retained for one year + period: P1Y + unlimited: false + latency: + description: Data is available within 25 hours after the order was placed + threshold: 25h + sourceTimestampField: orders.order_timestamp + processedTimestampField: orders.processed_timestamp + freshness: + description: The age of the youngest row in a table. + threshold: 25h + timestampField: orders.order_timestamp + frequency: + description: Data is delivered once a day + type: batch # or streaming + interval: daily # for batch, either or cron + cron: 0 0 * * * # for batch, either or interval + support: + description: The data is available during typical business hours at headquarters + time: 9am to 5pm in EST on business days + responseTime: 1h + backup: + description: Data is backed up once a week, every Sunday at 0:00 UTC. + interval: weekly + cron: 0 0 * * 0 + recoveryTime: 24 hours + recoveryPoint: 1 week +tags: + - checkout + - orders + - s3 +links: + datacontractCli: https://cli.datacontract.com diff --git a/tests/fixtures/custom/export_model/expected.sql b/tests/fixtures/custom/export_model/expected.sql new file mode 100644 index 000000000..4a2512697 --- /dev/null +++ b/tests/fixtures/custom/export_model/expected.sql @@ -0,0 +1,6 @@ + +SELECT + line_item_id AS line_item_id, + order_id AS order_id, + sku AS sku, +FROM {{ ref('line_items') }} \ No newline at end of file diff --git a/tests/fixtures/custom/export_model/expected_odcs_stg_users.sql b/tests/fixtures/custom/export_model/expected_odcs_stg_users.sql new file mode 100644 index 000000000..fe8c8dc0f --- /dev/null +++ b/tests/fixtures/custom/export_model/expected_odcs_stg_users.sql @@ -0,0 +1,4 @@ + +select + try_cast(user_id as varchar) as user_id +from {{ source('orders-unit-test', 'users') }} \ No newline at end of file diff --git a/tests/fixtures/custom/export_model/expected_odcs_users.sql b/tests/fixtures/custom/export_model/expected_odcs_users.sql new file mode 100644 index 000000000..e50de2585 --- /dev/null +++ b/tests/fixtures/custom/export_model/expected_odcs_users.sql @@ -0,0 +1,4 @@ + +SELECT + user_id AS user_id, +FROM {{ ref('users') }} \ No newline at end of file diff --git a/tests/fixtures/custom/export_model/expected_stg.sql b/tests/fixtures/custom/export_model/expected_stg.sql new file mode 100644 index 000000000..20dec2432 --- /dev/null +++ b/tests/fixtures/custom/export_model/expected_stg.sql @@ -0,0 +1,6 @@ + +select + try_cast(line_item_id as text) as line_item_id + try_cast(order_id as text) as order_id + try_cast(sku as text) as sku +from {{ source('orders-latest', 'line_items') }} \ No newline at end of file diff --git a/tests/fixtures/custom/export_model/template.sql b/tests/fixtures/custom/export_model/template.sql new file mode 100644 index 000000000..a004ce938 --- /dev/null +++ b/tests/fixtures/custom/export_model/template.sql @@ -0,0 +1,10 @@ + +SELECT +{%- for field in schema.properties %} + {%- if field.physicalType == "timestamp" %} + DATETIME({{ field.name }}, "Asia/Tokyo") AS {{ field.name }}, + {%- else %} + {{ field.name }} AS {{ field.name }}, + {%- endif %} +{%- endfor %} +FROM {{ "{{" }} ref('{{ schema_name }}') {{ "}}" }} diff --git a/tests/fixtures/custom/export_model/template_stg.sql b/tests/fixtures/custom/export_model/template_stg.sql new file mode 100644 index 000000000..2706d0137 --- /dev/null +++ b/tests/fixtures/custom/export_model/template_stg.sql @@ -0,0 +1,6 @@ + +select +{%- for field in schema.properties %} + try_cast({{ field.name }} as {{ field.physicalType | lower }}) as {{ field.name }} +{%- endfor %} +from {{ "{{" }} source('{{ data_contract.id.split(':')[-1] }}', '{{ schema_name }}') {{ "}}" }} diff --git a/tests/test_export_custom_model.py b/tests/test_export_custom_model.py new file mode 100644 index 000000000..646bd8d4a --- /dev/null +++ b/tests/test_export_custom_model.py @@ -0,0 +1,88 @@ +from pathlib import Path + +from typer.testing import CliRunner + +from datacontract.cli import app +from datacontract.data_contract import DataContract + +# logging.basicConfig(level=logging.DEBUG, force=True) + + +def test_cli(): + runner = CliRunner() + result = runner.invoke( + app, + [ + "export", + "./fixtures/custom/export/datacontract.yaml", + "--format", + "custom", + "--template", + "./fixtures/custom/export/template.sql", + "--schema-name", + "line_items", + ], + ) + assert result.exit_code == 0 + + +# -------------------------------------------------------------------------------------------------------- +# test simple template.sql +# -------------------------------------------------------------------------------------------------------- +def test_export_custom_schema_name(): + """test ol' datacontract.yaml with simple template.sql""" + path_fixtures = Path("fixtures/custom/export_model") + + data_contract = DataContract(data_contract_file=str(path_fixtures / "datacontract.yaml")) + template = path_fixtures / "template.sql" + + result = data_contract.export(export_format="custom", schema_name="line_items", template=template) + + with open(path_fixtures / "expected.sql", "r") as file: + assert result == file.read() + + +def test_export_odcs_custom_schema_name(): + """test ODCS datacontract.odcs.yaml with staging template.sql""" + path_fixtures = Path("fixtures/custom/export_model") + + data_contract = DataContract(data_contract_file=str(path_fixtures / "datacontract.odcs.yaml")) + template = path_fixtures / "template.sql" + + result = data_contract.export(export_format="custom", schema_name="users", template=template) + + with open(path_fixtures / "expected_odcs_users.sql", "r") as file: + assert result == file.read() + + +# -------------------------------------------------------------------------------------------------------- +# test staging template_stg.sql +# -------------------------------------------------------------------------------------------------------- + + +def test_export_custom_schema_name_stg(): + """test ol' datacontract.yaml with simple template.sql""" + path_fixtures = Path("fixtures/custom/export_model") + + data_contract = DataContract(data_contract_file=str(path_fixtures / "datacontract.yaml")) + template = path_fixtures / "template_stg.sql" + + result = data_contract.export(export_format="custom", schema_name="line_items", template=template) + print(result) + + with open(path_fixtures / "expected_stg.sql", "r") as file: + assert result == file.read() + + +def test_export_odcs_custom_schema_name_stg(): + """test ODCS datacontract.odcs.yaml with staging template_stg.sql""" + path_fixtures = Path("fixtures/custom/export_model") + + data_contract = DataContract(data_contract_file=str(path_fixtures / "datacontract.odcs.yaml")) + template = path_fixtures / "template_stg.sql" + + result = data_contract.export(export_format="custom", schema_name="users", template=template) + print(result) + + with open(path_fixtures / "expected_odcs_stg_users.sql", "r") as file: + assert result == file.read()