|
1 | 1 | --- |
2 | | -title: "Running Onboarding" |
| 2 | +title: "Run Onboarding" |
3 | 3 | date: 2021-08-04T14:25:26-04:00 |
4 | 4 | weight: 17 |
5 | 5 | draft: false |
6 | 6 | --- |
7 | 7 |
|
8 | | -#### Option#1: Python whl job |
9 | | -1. Go to your Databricks landing page and do one of the following: |
| 8 | +#### Option#1: Databricks Labs CLI |
| 9 | +##### pre-requisites: |
| 10 | +- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) |
| 11 | +- Python 3.8.0 + |
| 12 | +##### Steps: |
| 13 | +1. ``` git clone dlt-meta ``` |
| 14 | +2. ``` cd dlt-meta ``` |
| 15 | +3. ``` python -m venv .venv ``` |
| 16 | +4. ```source .venv/bin/activate ``` |
| 17 | +5. ``` pip install databricks-sdk ``` |
10 | 18 |
|
11 | | -2. In the sidebar, click Jobs Icon Workflows and click Create Job Button. |
| 19 | +##### run dlt-meta cli command: |
| 20 | + ```shell |
| 21 | + databricks labs dlt-meta onboard |
| 22 | +``` |
| 23 | +- Above command will prompt you to provide onboarding details. |
| 24 | +- If you have cloned dlt-meta git repo then accepting defaults will launch config from [demo/conf](https://github.com/databrickslabs/dlt-meta/tree/main/demo/conf) folder. |
| 25 | +- You can create onboarding files e.g onboarding.json, data quality and silver transformations and put it in conf folder as show in [demo/conf](https://github.com/databrickslabs/dlt-meta/tree/main/demo/conf) |
12 | 26 |
|
13 | | -3. In the sidebar, click New Icon New and select Job from the menu. |
14 | | - |
15 | | -4. In the task dialog box that appears on the Tasks tab, replace Add a name for your job… with your job name, for example, Python wheel example. |
16 | | - |
17 | | -5. In Task name, enter a name for the task, for example, ```dlt_meta_onboarding_pythonwheel_task```. |
18 | | - |
19 | | -6. In Type, select Python wheel. |
20 | | - |
21 | | -5. In Package name, enter ```dlt_meta```. |
22 | | - |
23 | | -6. In Entry point, enter ``run``. |
24 | | - |
25 | | -7. Click Add under Dependent Libraries. In the Add dependent library dialog, under Library Type, click PyPI. Enter Package = ```dlt-meta``` |
26 | | - |
27 | | -8. Click Add. |
28 | | - |
29 | | -9. In Parameters, select keyword argument then select JSON. Past below json parameters with : |
30 | | -```json |
31 | | - { |
32 | | - "onboard_layer": "bronze_silver", |
33 | | - "database": "dlt_demo", |
34 | | - "onboarding_file_path": "dbfs:/onboarding_files/users_onboarding.json", |
35 | | - "silver_dataflowspec_table": "silver_dataflowspec_table", |
36 | | - "silver_dataflowspec_path": "dbfs:/onboarding_tables_cdc/silver", |
37 | | - "bronze_dataflowspec_table": "bronze_dataflowspec_table", |
38 | | - "import_author": "Ravi", |
39 | | - "version": "v1", |
40 | | - "bronze_dataflowspec_path": "dbfs:/onboarding_tables_cdc/bronze", |
41 | | - "onboard_layer": "bronze_silver", |
42 | | - "uc_enabled": "False", |
43 | | - "overwrite": "True", |
44 | | - "env": "dev" |
45 | | - } |
| 27 | +```shell |
| 28 | + Provide onboarding file path (default: demo/conf/onboarding.template): |
| 29 | + Provide onboarding files local directory (default: demo/): |
| 30 | + Provide dbfs path (default: dbfs:/dlt-meta_cli_demo): |
| 31 | + Provide databricks runtime version (default: 14.2.x-scala2.12): |
| 32 | + Run onboarding with unity catalog enabled? |
| 33 | + [0] False |
| 34 | + [1] True |
| 35 | + Enter a number between 0 and 1: 1 |
| 36 | + Provide unity catalog name: uc_catalog_name |
| 37 | + Provide dlt meta schema name (default: dlt_meta_dataflowspecs_203b9): |
| 38 | + Provide dlt meta bronze layer schema name (default: dltmeta_bronze_cf595): |
| 39 | + Provide dlt meta silver layer schema name (default: dltmeta_silver_5afa2): |
| 40 | + Provide dlt meta layer |
| 41 | + [0] bronze |
| 42 | + [1] bronze_silver |
| 43 | + [2] silver |
| 44 | + Enter a number between 0 and 2: 1 |
| 45 | + Provide bronze dataflow spec table name (default: bronze_dataflowspec): |
| 46 | + Provide silver dataflow spec table name (default: silver_dataflowspec): |
| 47 | + Overwrite dataflow spec? |
| 48 | + [0] False |
| 49 | + [1] True |
| 50 | + Enter a number between 0 and 1: 1 |
| 51 | + Provide dataflow spec version (default: v1): |
| 52 | + Provide environment name (default: prod): prod |
| 53 | + Provide import author name (default: ravi.gawai): |
| 54 | + Provide cloud provider name |
| 55 | + [0] aws |
| 56 | + [1] azure |
| 57 | + [2] gcp |
| 58 | + Enter a number between 0 and 2: 0 |
| 59 | + Do you want to update ws paths, catalog, schema details to your onboarding file? |
| 60 | + [0] False |
| 61 | + [1] True |
46 | 62 | ``` |
47 | 63 |
|
48 | | -Alternatly you can enter keyword arguments, click + Add and enter a key and value. Click + Add again to enter more arguments. |
49 | | - |
50 | | -10. Click Save task. |
51 | | - |
52 | | -11. Run now |
53 | | - |
54 | | -12. Make sure job run successfully. Verify metadata in your dataflow spec tables entered in step: 11 e.g ```dlt_demo.bronze_dataflowspec_table``` , ```dlt_demo.silver_dataflowspec_table``` |
| 64 | +- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs |
0 commit comments