@@ -11,68 +11,46 @@ state this explicitly, by submitting any copyrighted material via pull request,
1111other means you agree to license the material under the project's Databricks license and
1212warrant that you have the legal authority to do so.
1313
14- # Building the code
14+ # Development Setup
1515
16- ## Package Dependencies
17- See the contents of the file ` python/require.txt ` to see the Python package dependencies.
18- Dependent packages are not installed automatically by the ` dbldatagen ` package.
16+ ## Python Compatibility
1917
20- ## Python compatibility
18+ The code supports Python 3.9+ and has been tested with Python 3.9.21 and later.
2119
22- The code has been tested with Python 3.9.21 and later.
20+ ## Quick Start
2321
24- ## Checking your code for common issues
22+ ``` bash
23+ # Install development dependencies
24+ make dev
2525
26- Run ` make dev-lint ` from the project root directory to run various code style checks.
27- These are based on the use of ` prospector ` , ` pylint ` and related tools.
26+ # Format and lint code
27+ make fmt # Format with ruff and fix issues
28+ make lint # Check code quality
2829
29- ## Setting up your build environment
30- Run ` make buildenv ` from the root of the project directory to setup a ` pipenv ` based build environment.
30+ # Run tests
31+ make test # Run tests
32+ make test-cov # Run tests with coverage report
3133
32- Run ` make create-dev-env ` from the root of the project directory to
33- set up a conda based virtualized Python build environment in the project directory.
34-
35- You can use alternative build virtualization environments or simply install the requirements
36- directly in your environment.
37-
38-
39- ## Build steps
34+ # Build package
35+ make build # Build with modern build system
36+ ```
4037
41- Our recommended mechanism for building the code is to use a ` conda ` or ` pipenv ` based development process.
38+ ## Development Tools
4239
43- But it can be built with any Python virtualization environment .
40+ All development tools are configured in ` pyproject.toml ` .
4441
45- ### Spark dependencies
46- The builds have been tested against Spark 3.3.0. This requires the OpenJDK 1.8.56 or later version of Java 8.
47- The Databricks runtimes use the Azul Zulu version of OpenJDK 8 and we have used these in local testing.
48- These are not installed automatically by the build process, so you will need to install them separately.
42+ ## Dependencies
4943
50- ### Building with Conda
51- To build with ` conda ` , perform the following commands:
52- - ` make create-dev-env ` from the main project directory to create your conda environment, if using
53- - activate the conda environment - e.g ` conda activate dbl_testdatagenerator `
54- - install the necessary dependencies in your conda environment via ` make install-dev-dependencies `
55-
56- - use the following to build and run the tests with a coverage report
57- - Run ` make dev-test-with-html-report ` from the main project directory.
44+ All dependencies are defined in ` pyproject.toml ` :
5845
59- - Use the following command to make the distributable:
60- - Run ` make dev-dist ` from the main project directory
61- - The resulting wheel file will be placed in the ` dist ` subdirectory
62-
63- ### Building with Pipenv
64- To build with ` pipenv ` , perform the following commands:
65- - ` make buildenv ` from the main project directory to create your conda environment, if using
66- - install the necessary dependencies in your conda environment via ` make install-dev-dependencies `
67-
68- - use the following to build and run the tests with a coverage report
69- - Run ` make test-with-html-report ` from the main project directory.
46+ - ` [project.dependencies] ` lists dependencies necessary to run the ` dbldatagen ` library
47+ - ` [tool.hatch.envs.default] ` lists the default environment necessary to develop, test, and build the ` dbldatagen ` library
7048
71- - Use the following command to make the distributable:
72- - Run ` make dist ` from the main project directory
73- - The resulting wheel file will be placed in the ` dist ` subdirectory
49+ ## Spark Dependencies
7450
75- The resulting build has been tested against Spark 3.3.0
51+ The builds have been tested against Spark 3.3.0+. This requires OpenJDK 1.8.56 or later version of Java 8.
52+ The Databricks runtimes use the Azul Zulu version of OpenJDK 8.
53+ These are not installed automatically by the build process.
7654
7755## Creating the HTML documentation
7856
@@ -82,7 +60,10 @@ The main html document will be in the file (relative to the root of the build di
8260 ` ./docs/docs/build/html/index.html `
8361
8462## Building the Python wheel
85- Run ` make clean dist ` from the main project directory.
63+
64+ ``` bash
65+ make build # Clean and build the package
66+ ```
8667
8768# Testing
8869
@@ -102,22 +83,19 @@ spark = dg.SparkSingleton.getLocalInstance("<name to flag spark instance>")
10283
10384The name used to flag the spark instance should be the test module or test class name.
10485
105- ## Running unit / integration tests
106-
107- If using an environment with multiple Python versions, make sure to use virtual env or
108- similar to pick up correct python versions. The make target ` create `
109-
110- If necessary, set ` PYSPARK_PYTHON ` and ` PYSPARK_DRIVER_PYTHON ` to point to correct versions of Python.
86+ ## Running Tests
11187
112- To run the tests using a ` conda ` environment:
113- - Run ` make dev-test ` from the main project directory to run the unit tests.
88+ ``` bash
89+ # Run all tests
90+ make test
11491
115- - Run ` make dev-test-with-html-report ` to generate test coverage report in ` htmlcov/inxdex.html `
92+ # Run tests with coverage report (generates htmlcov/index.html)
93+ make coverage
94+ ```
11695
117- To run the tests using a ` pipenv ` environment:
118- - Run ` make test ` from the main project directory to run the unit tests.
96+ If using an environment with multiple Python versions, make sure to use virtual env or similar to pick up correct python versions.
11997
120- - Run ` make test-with-html-report ` to generate test coverage report in ` htmlcov/inxdex.html `
98+ If necessary, set ` PYSPARK_PYTHON ` and ` PYSPARK_DRIVER_PYTHON ` to point to correct versions of Python.
12199
122100# Using the Databricks Labs data generator
123101The recommended method for installation is to install from the PyPi package
@@ -147,27 +125,41 @@ For example, the following code downloads the release V0.2.1
147125
148126> '%pip install https://github.com/databrickslabs/dbldatagen/releases/download/v021/dbldatagen-0.2.1-py3-none-any.whl '
149127
150- # Coding Style
128+ # Code Quality and Style
129+
130+ ## Automated Formatting
131+
132+ Code can be automatically formatted and linted with the following commands:
133+
134+ ``` bash
135+ # Format code and fix issues automatically
136+ make fmt
137+
138+ # Check code quality without making changes
139+ make lint
140+ ```
151141
152- The code follows the Pyspark coding conventions.
142+ ## Coding Conventions
153143
154- Basically it follows the Python PEP8 coding conventions - but method and argument names used mixed case starting
155- with a lower case letter rather than underscores following Pyspark coding conventions.
144+ The code follows PySpark coding conventions:
145+ - Python PEP8 standards with some PySpark-specific adaptations
146+ - Method and argument names use mixed case starting with lowercase (following PySpark conventions)
147+ - Line length limit of 120 characters
156148
157- See https://legacy .python.org/dev/peps/ pep-0008/
149+ See the [ Python PEP8 Guide ] ( https://peps .python.org/pep-0008/ ) for general Python style guidelines.
158150
159151# Github expectations
160- When running the unit tests on Github , the environment should use the same environment as the latest Databricks
161- runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 11 .3 onwards,
152+ When running the unit tests on GitHub , the environment should use the same environment as the latest Databricks
153+ runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 13 .3 onwards,
162154unit tests will be run on the environment corresponding to the latest LTS release.
163155
164- Libraries will use the same versions as the earliest supported LTS release - currently 11 .3 LTS
156+ Libraries will use the same versions as the earliest supported LTS release - currently 13 .3 LTS
165157
166158This means for the current build:
167159
168160- Use of Ubuntu 22.04 for the test runner
169161- Use of Java 8
170- - Use of Python 3.9.21 when testing / building the image
162+ - Use of Python 3.10.12 when testing / building the image
171163
172164See the following resources for more information
173165= https://docs.databricks.com/en/release-notes/runtime/15.4lts.html
0 commit comments