@@ -19,14 +19,8 @@ For example, at the time of writing, a billion row version of the IOT data set e
1919can be generated and written to a Delta table in
2020[ under 2 minutes using a 12 node x 8 core cluster (using DBR 8.3)] ( #scaling-it-up )
2121
22- > NOTE: The markup version of this document does not cover all of the classes and methods in the codebase.
23- > For further information on classes and methods contained in these modules, and
24- > to explore the python documentation for these modules, build the HTML documentation from
25- > the main project directory using ` make docs ` . Use your browser to explore the documentation by
26- > starting with the html file ` ./docs/build/html/index.html `
27- >
28- > If you are viewing the online help version of this document, the classes and methods are already included.
29-
22+ > NOTE: The markup version of this document does not cover all of the classes and methods in the codebase and some links
23+ > may not work. To see the documentation for the latest release, see the online documentation.
3024
3125## General Overview
3226
@@ -54,8 +48,9 @@ and [formatting on string columns](textdata)
5448
5549## Tutorials and examples
5650
57- In the [ root directory] ( https://github.com/databrickslabs/dbldatagen ) of the project, there are a number of
58- examples and tutorials.
51+ In the
52+ [ Github project directory] ( https://github.com/databrickslabs/dbldatagen/tree/release/v0.2.1 ) ,
53+ there are a number of examples and tutorials.
5954
6055The Python examples in the ` examples ` folder can be run directly or imported into the Databricks runtime environment
6156as Python files.
@@ -100,24 +95,22 @@ There is also support for applying arbitrary SQL expressions, and generation of
10095### Getting started
10196
10297Before using the data generator, you need to install the package in your environment and import it in your code.
103- You can install the package from the Github releases as a library on your cluster.
98+ You can install the package from PyPi as a library on your cluster.
10499
105100> NOTE: When running in a Databricks notebook environment, you can install directly using
106101> the ` %pip ` command in a notebook cell
107102>
108103> To install as a notebook scoped library, add a cell with the following text and execute it:
109104>
110- > ` %pip install git+https://github.com/databrickslabs/ dbldatagen@current `
105+ > ` %pip install dbldatagen `
111106
112107The ` %pip install ` method will work in the Databricks Community Environment and in Delta Live Tables pipelines also.
113108
114- You can also manually download a wheel file from the releases and install it in your environment.
109+ You can find more details and alternative installation methods at [ Installation notes ] ( installation_notes )
115110
116- The releases are located at
111+ The Github based releases are located at
117112[ Databricks Labs Data Generator releases] ( https://github.com/databrickslabs/dbldatagen/releases )
118113
119- You can find more details at [ Installation notes] ( installation_notes )
120-
121114Once installed, import the framework in your Python code to use it.
122115
123116For example:
0 commit comments