DBT to Dataform Conversion

This jupyter process converts a project written in DBT to a Google Dataform project. In this spreadsheet, you can see the details about the objects that are converted by the python code, and some important notes about possible limitations, as well as the roadmap for future implementations.

Process to follow before running notebook

Make sure you already have dataform installed on your computer; If it's not, just follow this walkthrough.
Make sure you have DBT's source repository on your local machine;
Clone this repository on the same path as you have your DBT Project
- gh repo clone datalakehouse/dbt-to-dataform
Copy .df_credentials.json file that you had generated on dataform configuration to the same path; The path structure should be like below;
- dbt_project/
- .df_credentials.json
- dbt-to-dataform/notebook
Make sure you have Python, Juptyter notebook or Hub installed on your machine;
Read the spreadsheet to make sure that each part of your code will be converted as expected;

Other Reference Project Docs

Begin the process

Start Jupyter

Execute the jupyter-notebook command on your CLI to start jupyter notebook.

After starting the jupyter notebook on your local machine, navigate to the web server page. This is typically, http://localhost:8888/

Navigate to the dbt_dataform_converter.ipynb file.

Input variables

On this part of the code (image), insert the variables as requested.

dbt_source_project_path: The path of your source dbt project;
dataform_root_path: The target dataform path to be generated;
target_schema: The name of the schema to be created by Dataform on the target data warehouse platform, e.g.: Snowflake;
conversion_type: Define if the code will be converted to JS or SQLX on dataform. If you want to create a Dataform package, must use JS, otherwise, SQLX.
dlh_timestamp_field: If your code has SCD Snapshot files, Dataform requires to inform a timestamp field to be checked when generating snapshot. Must be a field on your model that tracks the last update datetime for each record;

Running the python code

Run each cell of dbt_dataform_converter.ipynb file.

On the last cell, you must have a return close to this.

Running dataform project

Make sure you have read the spreadsheet to understand the current limitations of the converter based on your current DBT code.

On the case of the unit testing, based on dlh_square_analytics project, below are the changes that needed prior to running Dataform's code.

Square analytics project, uses a full_name macro. It was required to be rewritten on Dataform. Write your macro in a .JS file and put that file inside includes folder.

Add the name of the macro file before the name of the macro

Execute the dataform compile command to make sure nothing will break at runtime

Change the default schema on dataform.json file

Execute dataform run

Output

The output for Dataform will be generated on the path contained on dataform_root_path variable.

When the code runs, it will check if this directory already exists. If it does not exists, it will be created, otherwise, it will be deleted and created again.

Below, are the functions that will be run in sequence by dbt_dataform_converter function.

dataform_install_configuration

delete target repository if exists;
dataform init new repository;
copy .df_credentials file to new repository
edit packages.json file with target dataform version, and adding dataform-scd package;
runs dataform install to setup target version and scd package;

create_js_source_file

gets all yml files that contain sources in the models folder of dbt source project;
generate one .JS file for each source table contained on the yml files on definitions/sources on dataform project;

create_sqlx_models_files (*only used when `target_schema` = 'sqlx')

gets all .sql files on dbt's models repository;
copy all files to target dataform's definitions repository, replacing the extension to .sqlx;
replace header with dataform's syntax using replace and regex substitution functions;
replace syntax patterns on dbt project to dataform's;
remove unsupported DBT's config header features;
replace syntax pattern of incremental models macros on dataform;

create_js_model_files (*only used when `target_schema` = 'js')

gets all .sql files on dbt's models repository;
copy all files to target dataform's definitions repository, replacing the extension to .js;
replace header with dataform's syntax using replace and regex substitution functions;
replace syntax patterns on dbt project to dataform's;
remove unsupported DBT's config header features;
replace syntax pattern of incremental models macros on dataform;

create_slx_snapshot_files

gets all .sql files on dbt's snapshots repository;
copy all files to target dataform's definitions/snapthots repository, replacing the extension to .js;
gets the name of the table used on FROM clause of the snapshot file;
replace file for the dataform's scd JS pattern;

dataform_assertions_documentation

gets all .yml test and schema definition files on DBT's model repository;
gets unique and not_null tests and the corresponding tables and columns;
gets descriptions on tables and/or columns presents on yml files;
create a python dictionary with those tests and descriptions;
write assertions (tests) and descriptions inside each model already present on dataform's definitions folder

Comments and Community

If you have any comments, questions please consider joining our DataLakeHouse Slack Channel Community where we discuss this project and other data engineering and analytics engineering related topics, https://datalakehouse.slack.com/

Contribution

We welcome any and all feedback and contribution to further the project. Please take a look at this project on how to contribute. We think their guidelines are pretty darn good, https://github.com/firstcontributions/first-contributions

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
notebook		notebook
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DBT to Dataform Conversion

Process to follow before running notebook

Other Reference Project Docs

Begin the process

Start Jupyter

Input variables

Running the python code

Running dataform project

Output

dataform_install_configuration

create_js_source_file

create_sqlx_models_files (*only used when `target_schema` = 'sqlx')

create_js_model_files (*only used when `target_schema` = 'js')

create_slx_snapshot_files

dataform_assertions_documentation

Comments and Community

Contribution

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

datalakehouse/dbt-to-dataform

Folders and files

Latest commit

History

Repository files navigation

DBT to Dataform Conversion

Process to follow before running notebook

Other Reference Project Docs

Begin the process

Start Jupyter

Input variables

Running the python code

Running dataform project

Output

dataform_install_configuration

create_js_source_file

create_sqlx_models_files (*only used when target_schema = 'sqlx')

create_js_model_files (*only used when target_schema = 'js')

create_slx_snapshot_files

dataform_assertions_documentation

Comments and Community

Contribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

create_sqlx_models_files (*only used when `target_schema` = 'sqlx')

create_js_model_files (*only used when `target_schema` = 'js')

Packages