| Testing |
Veracity DataWorkbench is a Python SDK designed to bridge your Databricks environment with Veracity Data Workbench. It simplifies access to data cataloging, lineage tracking, and APIs.
- DataCatalogue: Register and manage datasets in the Veracity Data Workbench Data Catalogue.
This package is pre-installed in Veracity-hosted Databricks environments (if analytics features are enabled).
To install the latest version locally:
pip install https://github.com/veracity/DataWorkbench/releases/latest/download/dataworkbench-1.0-py3-none-any.whlMake sure you have the required credentials and environment variables set when running outside Databricks.
In Veracity-hosted Databricks, the SDK is ready to use:
import dataworkbenchTo use it on your local machine, it requires you to set a set of variables to connect to the Veracity Dataworkbench API.
from dataworkbench import DataCatalogue
df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "number"])
datacatalogue = DataCatalogue() # Naming subject to change
datacatalogue.save(df, "Dataset Name", "Description", tags={"environment": ["test"]})from dataworkbench import DataCatalogue
df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["letter", "number"])
datacatalogue = DataCatalogue() # Naming subject to change
datacatalogue.save(df, "Dataset Name", "Description", tags={"environment": ["test"]})save(df, name, description=None, tags=None): Save a Spark DataFrame to the Data Workbench Data Catalogue
Dataworkbench is licensed under WHICH LICENSE.