databrickslabs
diff --git a/‎CHANGELOG.md‎
Lines changed: 10 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 6 additions & 158 deletions b/‎README.md‎
Lines changed: 6 additions & 158 deletions
diff --git a/‎docs/content/faq/execution.md‎
Lines changed: 22 additions & 0 deletions b/‎docs/content/faq/execution.md‎
Lines changed: 22 additions & 0 deletions
diff --git a/‎docs/content/faq/general.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/content/faq/general.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/content/getting_started/additionals.md‎
Lines changed: 35 additions & 39 deletions b/‎docs/content/getting_started/additionals.md‎
Lines changed: 35 additions & 39 deletions
diff --git a/‎docs/content/getting_started/dltpipeline.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/content/getting_started/dltpipeline.md‎
Lines changed: 1 addition & 1 deletion
@@ -8,6 +8,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 **NOTE:** For CLI interfaces, we support SemVer approach. However, for API components we don't use SemVer as of now. This may lead to instability when using dbx API methods directly.
 
 [Please read through the Keep a Changelog (~5min)](https://keepachangelog.com/en/1.0.0/).
+## [v0.0.2] - 2023-05-11
+### Added
+- Table properties support for bronze, quarantine and silver tables using create_streaming_live_table api call
+- Support for track history column using apply_changes api
+- Support for delta as source
+- Validation for bronze/silver onboarding
+### Fixed
+- Input schema parsing issue in onboarding
+### Modified
+-  Readme and docs to include above features
 
 ## [v0.0.1] - 2023-03-22
 ### Added
 
@@ -20,7 +20,7 @@
              alt="GitHub Workflow Status (branch)"/>
     </a>
     <a href="https://codecov.io/gh/databrickslabs/dlt-meta">
-        <img src="https://img.shields.io/codecov/c/github/databrickslabs/dlt-meta?style=for-the-badge&amp;token=KI3HFZQWF0"
+        <img src="https://img.shields.io/codecov/c/github/databrickslabs/dlt-meta?style=for-the-badge&amp;token=2CxLj3YBam"
              alt="codecov"/>
     </a>
     <a href="https://lgtm.com/projects/g/databrickslabs/dlt-meta/alerts">
@@ -63,167 +63,15 @@ With this framework you need to record the source and target metadata in an onbo
 ## High-Level Process Flow:
 ![DLT-META High-Level Process Flow](./docs/static/images/solutions_overview.png)
 
-## More questions
-
-Refer to the [FAQ](https://databrickslabs.github.io/dlt-meta/faq)
-and DLT-META [documentation](https://databrickslabs.github.io/dlt-meta/)
-
 ## Steps
 ![DLT-META Stages](./docs/static/images/dlt-meta_stages.png)
 
+## Getting Started
+Refer to the [Getting Started](https://databrickslabs.github.io/dlt-meta/getting_started)
 
-## 1. Metadata preparation 
-1. Create ```onboarding.json``` metadata file and save to s3/adls/dbfs e.g.[onboarding file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/onboarding.json)
-2. Create ```silver_transformations.json``` and save to s3/adls/dbfs e.g [Silver transformation file](https://github.com/databrickslabs/dlt-meta/blob/main/examples/silver_transformations.json)
-3. Create data quality rules json and store to s3/adls/dbfs e.g [Data Quality Rules](https://github.com/databrickslabs/dlt-meta/tree/main/examples/dqe/customers/bronze_data_quality_expectations.json)
-
-## 2. Onboarding job
-
-1. Go to your Databricks landing page and do one of the following:
-
-2. In the sidebar, click Jobs Icon Workflows and click Create Job Button.
-
-3. In the sidebar, click New Icon New and select Job from the menu.
-
-4. In the task dialog box that appears on the Tasks tab, replace Add a name for your job… with your job name, for example, Python wheel example.
-
-5. In Task name, enter a name for the task, for example, ```dlt_meta_onboarding_pythonwheel_task```.
-
-6. In Type, select Python wheel.
-
-5. In Package name, enter ```dlt_meta```.
-
-6. In Entry point, enter ``run``. 
-
-7. Click Add under Dependent Libraries. In the Add dependent library dialog, under Library Type, click PyPI. Enter Package: ```dlt-meta```
- 
-
-8. Click Add.
-
-9. In Parameters, select keyword argument then select JSON. Past below json parameters with :
-    ``` 
-    {
-                        "database": "dlt_demo",
-                        "onboarding_file_path": "dbfs:/onboarding_files/users_onboarding.json",
-                        "silver_dataflowspec_table": "silver_dataflowspec_table",
-                        "silver_dataflowspec_path": "dbfs:/onboarding_tables_cdc/silver",
-                        "bronze_dataflowspec_table": "bronze_dataflowspec_table",
-                        "import_author": "Ravi",
-                        "version": "v1",
-                        "bronze_dataflowspec_path": "dbfs:/onboarding_tables_cdc/bronze",
-                        "overwrite": "True",
-                        "env": "dev"
-    } 
-    ```
-    Alternatly you can enter keyword arguments, click + Add and enter a key and value. Click + Add again to enter more arguments. 
-
-10. Click Save task.
-
-11. Run now
-
-12. Make sure job run successfully. Verify metadata in your dataflow spec tables entered in step: 9 e.g ```dlt_demo.bronze_dataflowspec_table``` , ```dlt_demo.silver_dataflowspec_table```
-
-## 3. Launch Dataflow DLT Pipeline  
-### Create a dlt launch notebook
-
-1. Go to your Databricks landing page and select Create a notebook, or click New Icon New in the sidebar and select Notebook. The Create Notebook dialog appears.
-
-2. In the Create Notebook dialogue, give your notebook a name e.g ```dlt_meta_pipeline``` and select Python from the Default Language dropdown menu. You can leave Cluster set to the default value. The Delta Live Tables runtime creates a cluster before it runs your pipeline.
-
-3. Click Create.
-
-4. You can add the [example dlt pipeline](https://github.com/databrickslabs/dlt-meta/blob/main/examples/dlt_meta_pipeline.ipynb) code or import iPython notebook as is.
-
-### Create a DLT pipeline
-
-1. Click Jobs Icon Workflows in the sidebar, click the Delta Live Tables tab, and click Create Pipeline.
-
-2. Give the pipeline a name e.g. DLT_META_BRONZE and click File Picker Icon to select a notebook ```dlt_meta_pipeline``` created in step: ```Create a dlt launch notebook```.
-
-3. Optionally enter a storage location for output data from the pipeline. The system uses a default location if you leave Storage location empty.
-
-4. Select Triggered for Pipeline Mode.
-
-5. Enter Configuration parameters e.g.
-    ```
-    "layer": "bronze",
-    "bronze.dataflowspecTable": "dataflowspec table name",
-    "bronze.group": "enter group name from metadata e.g. G1",
-    ```
-
-6. Enter target schema where you wants your bronze/silver tables to be created
-
-7. Click Create.
-
-8. Start pipeline: click the Start button on in top panel. The system returns a message confirming that your pipeline is starting 
-
-
-
-# Additional
-You can run integration tests from you local with dlt-meta.
-## Run Integration Tests
-1. Clone [DLT-META](https://github.com/databrickslabs/dlt-meta)
-
-2. Open terminal and Goto root folder ```DLT-META```
-
-3. Create environment variables.
-
-```
-export DATABRICKS_HOST=<DATABRICKS HOST>
-export DATABRICKS_TOKEN=<DATABRICKS TOKEN> # Account needs permission to create clusters/dlt pipelines.
-```
-
-4. Run itegration tests for different supported input sources: cloudfiles, eventhub, kafka
-
-    4a. Run the command for cloudfiles ```python integration-tests/run-integration-test.py  --cloud_provider_name=aws --dbr_version=11.3.x-scala2.12 --source=cloudfiles --dbfs_path=dbfs:/tmp/DLT-META/```
-
-    4b. Run the command for eventhub ```python integration-tests/run-integration-test.py --cloud_provider_name=azure --dbr_version=11.3.x-scala2.12 --source=eventhub --dbfs_path=dbfs:/tmp/DLT-META/ --eventhub_name=iot --eventhub_secrets_scope_name=eventhubs_creds --eventhub_namespace=int_test-standard --eventhub_port=9093 --eventhub_producer_accesskey_name=producer ----eventhub_consumer_accesskey_name=consumer```
-
-        For eventhub integration tests, the following are the prerequisites:
-        1. Needs eventhub instance running
-        2. Using Databricks CLI, Create databricks secrets scope for eventhub keys
-        3. Using Databricks CLI, Create databricks secrets to store producer and consumer keys using the scope created in step 2 
-
-        Following are the mandatory arguments for running EventHubs integration test
-        1. Provide your eventhub topic name : ```--eventhub_name```
-        2. Provide eventhub namespace using ```--eventhub_namespace```
-        3. Provide eventhub port using ```--eventhub_port```
-        4. Provide databricks secret scope name using ```----eventhub_secrets_scope_name```
-        5. Provide eventhub producer access key name using ```--eventhub_producer_accesskey_name```
-        6. Provide eventhub access key name using ```--eventhub_consumer_accesskey_name```
-
-
-    4c. Run the command for kafka ```python3 integration-tests/run-integration-test.py --cloud_provider_name=aws --dbr_version=11.3.x-scala2.12 --source=kafka --dbfs_path=dbfs:/tmp/DLT-META/ --kafka_topic_name=dlt-meta-integration-test --kafka_broker=host:9092```
-
-        For kafka integration tests, the following are the prerequisites:
-        1. Needs kafka instance running
-
-        Following are the mandatory arguments for running EventHubs integration test
-        1. Provide your kafka topic name : ```--kafka_topic_name```
-        2. Provide kafka_broker  ```--kafka_broker```
-
-
-
-    Once finished integration output file will be copied locally to  ```integration-test-output_<run_id>.csv```
-
-5. Output of a successful run should have the following in the file 
-
-    ```
-    ,0
-    0,Completed Bronze DLT Pipeline.
-    1,Completed Silver DLT Pipeline.
-    2,Validating DLT Bronze and Silver Table Counts...
-    3,Validating Counts for Table bronze_7b866603ab184c70a66805ac8043a03d.transactions_cdc.
-    4,Expected: 10002 Actual: 10002. Passed!
-    5,Validating Counts for Table bronze_7b866603ab184c70a66805ac8043a03d.transactions_cdc_quarantine.
-    6,Expected: 9842 Actual: 9842. Passed!
-    7,Validating Counts for Table bronze_7b866603ab184c70a66805ac8043a03d.customers_cdc.
-    8,Expected: 98928 Actual: 98928. Passed!
-    9,Validating Counts for Table silver_7b866603ab184c70a66805ac8043a03d.transactions.
-    10,Expected: 8759 Actual: 8759. Passed!
-    11,Validating Counts for Table silver_7b866603ab184c70a66805ac8043a03d.customers.
-    12,Expected: 87256 Actual: 87256. Passed!
-    ```
+## More questions
+Refer to the [FAQ](https://databrickslabs.github.io/dlt-meta/faq)
+and DLT-META [documentation](https://databrickslabs.github.io/dlt-meta/)
 
 # Project Support
 Please note that all projects released under [`Databricks Labs`](https://www.databricks.com/learn/labs)
 
@@ -24,3 +24,25 @@ DLT-META translates input metadata into Delta table as DataflowSpecs
 **Q. How many DLT pipelines will be launched using DLT-META?**
 
 DLT-META uses data_flow_group to launch DLT pipelines, so all the tables belongs to same group will be executed under single DLT pipeline. 
+
+**Q. Can we run onboarding for bronze layer only?**
+
+Yes! Remove silver related attributes from onboarding file and call  `onboard_bronze_dataflow_spec()` API from ```OnboardDataflowspec```. Similarly you can run silver layer onboarding separately using `onboard_silver_dataflow_spec()`API from `OnboardDataflowspec` with silver parameters included in `onboarding_params_map`
+
+```
+onboarding_params_map = {
+                      "onboarding_file_path":onboarding_file_path,
+                      "database":bronze_database,
+                      "env":"dev", 
+                      "bronze_dataflowspec_table":"bronze_dataflowspec_tablename",
+                      "bronze_dataflowspec_path": bronze_dataflowspec_path,
+                      "overwrite":"True",
+                      "version":"v1",
+                      "import_author":"Ravi",
+}
+print(onboarding_params_map)
+
+from src.onboard_dataflowspec import OnboardDataflowspec
+OnboardDataflowspec(spark, onboarding_params_map).onboard_bronze_dataflow_spec()
+
+```
@@ -17,7 +17,7 @@ DLT-META is a solution/framework using Databricks Delta Live Tables aka DLT whic
 
 **Q. What different types of reader are supported using DLT-META ?**
 
-DLT-META uses Databricks [Auto Loader](https://docs.databricks.com/ingestion/auto-loader/index.html) to read from s3/adls/blog stroage.
+DLT-META uses Databricks [Auto Loader](https://docs.databricks.com/ingestion/auto-loader/index.html), DELTA, KAFKA, EVENTHUB to read from s3/adls/blog stroage.
 
 **Q. Can DLT-META support any other readers?**
 
 
@@ -1,9 +1,10 @@
 ---
 title: "Additionals"
 date: 2021-08-04T14:25:26-04:00
-weight: 19
+weight: 21
 draft: false
 ---
+ This is easist way to launch dlt-meta to your databricks workspace with following steps.
 
 ## Run Integration Tests
 1. Launch Terminal/Command promt
@@ -22,55 +23,50 @@ export DATABRICKS_TOKEN=<DATABRICKS TOKEN> # Account needs permission to create
 5. Run integration test against cloudfile or eventhub or kafka using below options:
     5a. Run the command for cloudfiles ```python integration-tests/run-integration-test.py  --cloud_provider_name=aws --dbr_version=11.3.x-scala2.12 --source=cloudfiles --dbfs_path=dbfs:/tmp/DLT-META/```
 
-    5b. Run the command for eventhub ```python integration-tests/run-integration-test.py --cloud_provider_name=azure --dbr_version=11.3.x-scala2.12 --source=eventhub --dbfs_path=dbfs:/tmp/DLT-META/ --eventhub_name=iot --eventhub_secrets_scope_name=eventhubs_creds --eventhub_namespace=int_test-standard --eventhub_port=9093 --eventhub_producer_accesskey_name=producer ----eventhub_consumer_accesskey_name=consumer```
+    5b. Run the command for eventhub ```python integration-tests/run-integration-test.py --cloud_provider_name=azure --dbr_version=11.3.x-scala2.12 --source=eventhub --dbfs_path=dbfs:/tmp/DLT-META/ --eventhub_name=iot --eventhub_secrets_scope_name=eventhubs_creds --eventhub_namespace=int_test-standard --eventhub_port=9093 --eventhub_producer_accesskey_name=producer --eventhub_consumer_accesskey_name=consumer```
 
-        For eventhub integration tests, the following are the prerequisites:
-        1. Needs eventhub instance running
-        2. Using Databricks CLI, Create databricks secrets scope for eventhub keys
-        3. Using Databricks CLI, Create databricks secrets to store producer and consumer keys using the scope created in step 2 
+    For eventhub integration tests, the following are the prerequisites:
+    1. Needs eventhub instance running
+    2. Using Databricks CLI, Create databricks secrets scope for eventhub keys
+    3. Using Databricks CLI, Create databricks secrets to store producer and consumer keys using the scope created in step 2 
 
-        Following are the mandatory arguments for running EventHubs integration test
-        1. Provide your eventhub topic name : ```--eventhub_name```
-        2. Provide eventhub namespace using ```--eventhub_namespace```
-        3. Provide eventhub port using ```--eventhub_port```
-        4. Provide databricks secret scope name using ```----eventhub_secrets_scope_name```
-        5. Provide eventhub producer access key name using ```--eventhub_producer_accesskey_name```
-        6. Provide eventhub access key name using ```--eventhub_consumer_accesskey_name```
+    Following are the mandatory arguments for running EventHubs integration test
+    1. Provide your eventhub topic : --eventhub_name
+    2. Provide eventhub namespace : --eventhub_namespace
+    3. Provide eventhub port : --eventhub_port
+    4. Provide databricks secret scope name : --eventhub_secrets_scope_name
+    5. Provide eventhub producer access key name : --eventhub_producer_accesskey_name
+    6. Provide eventhub access key name : --eventhub_consumer_accesskey_name
 
 
     5c. Run the command for kafka ```python3 integration-tests/run-integration-test.py --cloud_provider_name=aws --dbr_version=11.3.x-scala2.12 --source=kafka --dbfs_path=dbfs:/tmp/DLT-META/ --kafka_topic_name=dlt-meta-integration-test --kafka_broker=host:9092```
 
-        For kafka integration tests, the following are the prerequisites:
-        1. Needs kafka instance running
+    For kafka integration tests, the following are the prerequisites:
+    1. Needs kafka instance running
 
-        Following are the mandatory arguments for running EventHubs integration test
-        1. Provide your kafka topic name : ```--kafka_topic_name```
-        2. Provide kafka_broker  ```--kafka_broker```
+    Following are the mandatory arguments for running EventHubs integration test
+    1. Provide your kafka topic name : --kafka_topic_name
+    2. Provide kafka_broker : --kafka_broker
 
 6. Once finished integration output file will be copied locally to 
 ```integration-test-output_<run_id>.txt```
 
 7. Output of a successful run should have the following in the file 
 ```
-Generating Onboarding Json file for Integration Test.
-Successfully Generated Onboarding Json file for Integration Test.
-Setting up dlt-meta metadata tables.
-Successfully setup dlt-meta metadata tables.
-Completed Bronze DLT Pipeline.
-Completed Silver DLT Pipeline.
-Validating DLT Bronze and Silver Table Counts...
-Validating Counts for Table bronze_f7d4934efe494de987f364e8d93acaba.transactions_cdc.
-Expected: 10002 Actual: 10002. Passed!
-Validating Counts for Table bronze_f7d4934efe494de987f364e8d93acaba.transactions_cdc_quarantine.
-Expected: 9842 Actual: 9842. Passed!
-Validating Counts for Table bronze_f7d4934efe494de987f364e8d93acaba.customers_cdc.
-Expected: 98928 Actual: 98928. Passed!
-Validating Counts for Table silver_f7d4934efe494de987f364e8d93acaba.transactions.
-Expected: 8759 Actual: 8759. Passed!
-Validating Counts for Table silver_f7d4934efe494de987f364e8d93acaba.customers.
-Expected: 87256 Actual: 87256. Passed!
-DROPPING DB bronze_f7d4934efe494de987f364e8d93acaba
-DROPPING DB silver_f7d4934efe494de987f364e8d93acaba
-DROPPING DB dlt_meta_framework_it_f7d4934efe494de987f364e8d93acaba_f7d4934efe494de987f364e8d93acaba
-Removed Integration test databases
+,0
+0,Completed Bronze DLT Pipeline.
+1,Completed Silver DLT Pipeline.
+2,Validating DLT Bronze and Silver Table Counts...
+3,Validating Counts for Table bronze_7d1d3ccc9e144a85b07c23110ea50133.transactions.
+4,Expected: 10002 Actual: 10002. Passed!
+5,Validating Counts for Table bronze_7d1d3ccc9e144a85b07c23110ea50133.transactions_quarantine.
+6,Expected: 7 Actual: 7. Passed!
+7,Validating Counts for Table bronze_7d1d3ccc9e144a85b07c23110ea50133.customers.
+8,Expected: 98928 Actual: 98923. Failed!
+9,Validating Counts for Table bronze_7d1d3ccc9e144a85b07c23110ea50133.customers_quarantine.
+10,Expected: 1077 Actual: 1077. Passed!
+11,Validating Counts for Table silver_7d1d3ccc9e144a85b07c23110ea50133.transactions.
+12,Expected: 8759 Actual: 8759. Passed!
+13,Validating Counts for Table silver_7d1d3ccc9e144a85b07c23110ea50133.customers.
+14,Expected: 87256 Actual: 87251. Failed!
 ```
@@ -1,7 +1,7 @@
 ---
 title: "Launch Generic DLT pipeline"
 date: 2021-08-04T14:25:26-04:00
-weight: 18
+weight: 20
 draft: false
 ---