|
| 1 | +--- |
| 2 | +title: "DAB Demo" |
| 3 | +date: 2024-02-26T14:25:26-04:00 |
| 4 | +weight: 28 |
| 5 | +draft: false |
| 6 | +--- |
| 7 | + |
| 8 | +### DAB Demo |
| 9 | + |
| 10 | +## Overview |
| 11 | +This demo showcases how to use Databricks Asset Bundles (DABs) with DLT-Meta: |
| 12 | + |
| 13 | +This demo will perform following steps: |
| 14 | +- Create dlt-meta schema's for dataflowspec and bronze/silver layer |
| 15 | +- Upload necessary resources to unity catalog volume |
| 16 | +- Create DAB files with catalog, schema, file locations populated |
| 17 | +- Deploy DAB to databricks workspace |
| 18 | +- Run onboarding using DAB commands |
| 19 | +- Run Bronze/Silver Pipelines using DAB commands |
| 20 | +- Demo examples will showcase fan-out pattern in silver layer |
| 21 | +- Demo example will show case custom transformations for bronze/silver layers |
| 22 | +- Adding custom columns and metadata to Bronze tables |
| 23 | +- Implementing SCD Type 1 to Silver tables |
| 24 | +- Applying expectations to filter data in Silver tables |
| 25 | + |
| 26 | +### Steps: |
| 27 | +1. Launch Command Prompt |
| 28 | + |
| 29 | +2. Install [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) |
| 30 | + - Once you install Databricks CLI, authenticate your current machine to a Databricks Workspace: |
| 31 | + |
| 32 | + ```commandline |
| 33 | + databricks auth login --host WORKSPACE_HOST |
| 34 | + ``` |
| 35 | +
|
| 36 | +3. Install Python package requirements: |
| 37 | + ```commandline |
| 38 | + # Core requirements |
| 39 | + pip install "PyYAML>=6.0" setuptools databricks-sdk |
| 40 | +
|
| 41 | + # Development requirements |
| 42 | + pip install flake8==6.0 delta-spark==3.0.0 pytest>=7.0.0 coverage>=7.0.0 pyspark==3.5.5 |
| 43 | + ``` |
| 44 | +
|
| 45 | +4. Clone dlt-meta: |
| 46 | + ```commandline |
| 47 | + git clone https://github.com/databrickslabs/dlt-meta.git |
| 48 | + ``` |
| 49 | +
|
| 50 | +5. Navigate to project directory: |
| 51 | + ```commandline |
| 52 | + cd dlt-meta |
| 53 | + ``` |
| 54 | +
|
| 55 | +6. Set python environment variable into terminal: |
| 56 | + ```commandline |
| 57 | + dlt_meta_home=$(pwd) |
| 58 | + export PYTHONPATH=$dlt_meta_home |
| 59 | + ``` |
| 60 | +
|
| 61 | +7. Generate DAB resources and set up schemas: |
| 62 | + This command will: |
| 63 | + - Generate DAB configuration files |
| 64 | + - Create DLT-Meta schemas |
| 65 | + - Upload necessary files to volumes |
| 66 | + ```commandline |
| 67 | + python demo/generate_dabs_resources.py --source=cloudfiles --uc_catalog_name=<your_catalog_name> --profile=<your_profile> |
| 68 | + ``` |
| 69 | + > Note: If you don't specify `--profile`, you'll be prompted for your Databricks workspace URL and access token. |
| 70 | +
|
| 71 | +8. Deploy and run the DAB bundle: |
| 72 | + - Navigate to the DAB directory: |
| 73 | + ```commandline |
| 74 | + cd demo/dabs |
| 75 | + ``` |
| 76 | +
|
| 77 | + - Validate the bundle configuration: |
| 78 | + ```commandline |
| 79 | + databricks bundle validate --profile=<your_profile> |
| 80 | + ``` |
| 81 | +
|
| 82 | + - Deploy the bundle to dev environment: |
| 83 | + ```commandline |
| 84 | + databricks bundle deploy --target dev --profile=<your_profile> |
| 85 | + ``` |
| 86 | +
|
| 87 | + - Run the onboarding job: |
| 88 | + ```commandline |
| 89 | + databricks bundle run onboard_people -t dev --profile=<your_profile> |
| 90 | + ``` |
| 91 | +
|
| 92 | + - Execute the pipelines: |
| 93 | + ```commandline |
| 94 | + databricks bundle run execute_pipelines_people -t dev --profile=<your_profile> |
| 95 | + ``` |
| 96 | +
|
| 97 | + |
| 98 | + |
0 commit comments