Skip to content

Commit 9e898ab

Browse files
committed
Initial commit
1 parent 41fc07f commit 9e898ab

File tree

14 files changed

+166
-2054
lines changed

14 files changed

+166
-2054
lines changed

.github/dependabot.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
version: 2
2+
updates:
3+
- package-ecosystem: "pip"
4+
directory: "/"
5+
schedule:
6+
interval: "daily"
7+
- package-ecosystem: "github-actions"
8+
directory: "/"
9+
schedule:
10+
interval: "daily"

.github/workflows/push.yml

Lines changed: 25 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,28 @@ on:
77
branches: [master]
88

99
jobs:
10+
fmt:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Checkout
14+
uses: actions/[email protected]
15+
16+
- name: Format files
17+
run: make dev fmt
18+
19+
- name: Fail on differences
20+
run: git diff --exit-code
21+
1022
tests:
23+
strategy:
24+
fail-fast: false
25+
matrix:
26+
python-version: [ '3.10', '3.11', '3.12' ]
1127
# Ubuntu latest no longer installs Python 3.9 by default so install it
1228
runs-on: ubuntu-22.04
1329
steps:
1430
- name: Checkout
15-
uses: actions/checkout@v4
31+
uses: actions/checkout@v4.2.2
1632
with:
1733
fetch-depth: 0
1834

@@ -26,34 +42,23 @@ jobs:
2642
# key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
2743
# restore-keys: |
2844
# ${{ runner.os }}-go-
45+
2946
- name: Set Java 8
3047
run: |
3148
sudo update-alternatives --set java /usr/lib/jvm/temurin-8-jdk-amd64/bin/java
3249
java -version
3350
34-
- name: Set up Python 3.9.21
51+
- name: Install Python
3552
uses: actions/setup-python@v5
3653
with:
37-
python-version: '3.9.21'
38-
cache: 'pipenv'
54+
cache: 'pip'
55+
cache-dependency-path: '**/pyproject.toml'
56+
python-version: ${{ matrix.python-version }}
3957

40-
- name: Check Python version
41-
run: python --version
42-
43-
- name: Install pip
44-
run: python -m pip install --upgrade pip
45-
46-
- name: Install
47-
run: pip install pipenv
48-
49-
- name: Install dependencies
50-
run: pipenv install --dev
51-
52-
- name: Lint
53-
run: |
54-
pipenv run prospector --profile prospector.yaml
58+
- name: Install Hatch
59+
run: pip install hatch
5560

56-
- name: Run tests
61+
- name: Run unit tests
5762
run: make test
5863

5964
- name: Publish test coverage to coverage site

.gitignore

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,15 @@ docs/source/reference/api/*.rst
3636
.coverage
3737
htmlcov/
3838
.coverage.xml
39+
40+
# IDE-specific folders — prevent local/editor config files from polluting source control.
41+
# PyCharm
42+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
43+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
44+
# and can be added to the global gitignore or merged into this file. For a more nuclear
45+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
46+
.idea/
47+
# Cursor IDE
48+
# Cursor is an AI-powered code editor. The .cursor/ directory contains IDE-specific
49+
# settings and configurations similar to other IDEs.
50+
.cursor/

CONTRIBUTING.md

Lines changed: 62 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -11,68 +11,46 @@ state this explicitly, by submitting any copyrighted material via pull request,
1111
other means you agree to license the material under the project's Databricks license and
1212
warrant that you have the legal authority to do so.
1313

14-
# Building the code
14+
# Development Setup
1515

16-
## Package Dependencies
17-
See the contents of the file `python/require.txt` to see the Python package dependencies.
18-
Dependent packages are not installed automatically by the `dbldatagen` package.
16+
## Python Compatibility
1917

20-
## Python compatibility
18+
The code supports Python 3.9+ and has been tested with Python 3.9.21 and later.
2119

22-
The code has been tested with Python 3.9.21 and later.
20+
## Quick Start
2321

24-
## Checking your code for common issues
22+
```bash
23+
# Install development dependencies
24+
make dev
2525

26-
Run `make dev-lint` from the project root directory to run various code style checks.
27-
These are based on the use of `prospector`, `pylint` and related tools.
26+
# Format and lint code
27+
make fmt # Format with ruff and fix issues
28+
make lint # Check code quality
2829

29-
## Setting up your build environment
30-
Run `make buildenv` from the root of the project directory to setup a `pipenv` based build environment.
30+
# Run tests
31+
make test # Run tests
32+
make test-cov # Run tests with coverage report
3133

32-
Run `make create-dev-env` from the root of the project directory to
33-
set up a conda based virtualized Python build environment in the project directory.
34-
35-
You can use alternative build virtualization environments or simply install the requirements
36-
directly in your environment.
37-
38-
39-
## Build steps
34+
# Build package
35+
make build # Build with modern build system
36+
```
4037

41-
Our recommended mechanism for building the code is to use a `conda` or `pipenv` based development process.
38+
## Development Tools
4239

43-
But it can be built with any Python virtualization environment.
40+
All development tools are configured in `pyproject.toml`.
4441

45-
### Spark dependencies
46-
The builds have been tested against Spark 3.3.0. This requires the OpenJDK 1.8.56 or later version of Java 8.
47-
The Databricks runtimes use the Azul Zulu version of OpenJDK 8 and we have used these in local testing.
48-
These are not installed automatically by the build process, so you will need to install them separately.
42+
## Dependencies
4943

50-
### Building with Conda
51-
To build with `conda`, perform the following commands:
52-
- `make create-dev-env` from the main project directory to create your conda environment, if using
53-
- activate the conda environment - e.g `conda activate dbl_testdatagenerator`
54-
- install the necessary dependencies in your conda environment via `make install-dev-dependencies`
55-
56-
- use the following to build and run the tests with a coverage report
57-
- Run `make dev-test-with-html-report` from the main project directory.
44+
All dependencies are defined in `pyproject.toml`:
5845

59-
- Use the following command to make the distributable:
60-
- Run `make dev-dist` from the main project directory
61-
- The resulting wheel file will be placed in the `dist` subdirectory
62-
63-
### Building with Pipenv
64-
To build with `pipenv`, perform the following commands:
65-
- `make buildenv` from the main project directory to create your conda environment, if using
66-
- install the necessary dependencies in your conda environment via `make install-dev-dependencies`
67-
68-
- use the following to build and run the tests with a coverage report
69-
- Run `make test-with-html-report` from the main project directory.
46+
- `[project.dependencies]` lists dependencies necessary to run the `dbldatagen` library
47+
- `[tool.hatch.envs.default]` lists the default environment necessary to develop, test, and build the `dbldatagen` library
7048

71-
- Use the following command to make the distributable:
72-
- Run `make dist` from the main project directory
73-
- The resulting wheel file will be placed in the `dist` subdirectory
49+
## Spark Dependencies
7450

75-
The resulting build has been tested against Spark 3.3.0
51+
The builds have been tested against Spark 3.3.0+. This requires OpenJDK 1.8.56 or later version of Java 8.
52+
The Databricks runtimes use the Azul Zulu version of OpenJDK 8.
53+
These are not installed automatically by the build process.
7654

7755
## Creating the HTML documentation
7856

@@ -82,7 +60,10 @@ The main html document will be in the file (relative to the root of the build di
8260
`./docs/docs/build/html/index.html`
8361

8462
## Building the Python wheel
85-
Run `make clean dist` from the main project directory.
63+
64+
```bash
65+
make build # Clean and build the package
66+
```
8667

8768
# Testing
8869

@@ -102,22 +83,19 @@ spark = dg.SparkSingleton.getLocalInstance("<name to flag spark instance>")
10283

10384
The name used to flag the spark instance should be the test module or test class name.
10485

105-
## Running unit / integration tests
106-
107-
If using an environment with multiple Python versions, make sure to use virtual env or
108-
similar to pick up correct python versions. The make target `create`
109-
110-
If necessary, set `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` to point to correct versions of Python.
86+
## Running Tests
11187

112-
To run the tests using a `conda` environment:
113-
- Run `make dev-test` from the main project directory to run the unit tests.
88+
```bash
89+
# Run all tests
90+
make test
11491

115-
- Run `make dev-test-with-html-report` to generate test coverage report in `htmlcov/inxdex.html`
92+
# Run tests with coverage report (generates htmlcov/index.html)
93+
make coverage
94+
```
11695

117-
To run the tests using a `pipenv` environment:
118-
- Run `make test` from the main project directory to run the unit tests.
96+
If using an environment with multiple Python versions, make sure to use virtual env or similar to pick up correct python versions.
11997

120-
- Run `make test-with-html-report` to generate test coverage report in `htmlcov/inxdex.html`
98+
If necessary, set `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` to point to correct versions of Python.
12199

122100
# Using the Databricks Labs data generator
123101
The recommended method for installation is to install from the PyPi package
@@ -147,27 +125,41 @@ For example, the following code downloads the release V0.2.1
147125

148126
> '%pip install https://github.com/databrickslabs/dbldatagen/releases/download/v021/dbldatagen-0.2.1-py3-none-any.whl'
149127
150-
# Coding Style
128+
# Code Quality and Style
129+
130+
## Automated Formatting
131+
132+
Code can be automatically formatted and linted with the following commands:
133+
134+
```bash
135+
# Format code and fix issues automatically
136+
make fmt
137+
138+
# Check code quality without making changes
139+
make lint
140+
```
151141

152-
The code follows the Pyspark coding conventions.
142+
## Coding Conventions
153143

154-
Basically it follows the Python PEP8 coding conventions - but method and argument names used mixed case starting
155-
with a lower case letter rather than underscores following Pyspark coding conventions.
144+
The code follows PySpark coding conventions:
145+
- Python PEP8 standards with some PySpark-specific adaptations
146+
- Method and argument names use mixed case starting with lowercase (following PySpark conventions)
147+
- Line length limit of 120 characters
156148

157-
See https://legacy.python.org/dev/peps/pep-0008/
149+
See the [Python PEP8 Guide](https://peps.python.org/pep-0008/) for general Python style guidelines.
158150

159151
# Github expectations
160-
When running the unit tests on Github, the environment should use the same environment as the latest Databricks
161-
runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 11.3 onwards,
152+
When running the unit tests on GitHub, the environment should use the same environment as the latest Databricks
153+
runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 13.3 onwards,
162154
unit tests will be run on the environment corresponding to the latest LTS release.
163155

164-
Libraries will use the same versions as the earliest supported LTS release - currently 11.3 LTS
156+
Libraries will use the same versions as the earliest supported LTS release - currently 13.3 LTS
165157

166158
This means for the current build:
167159

168160
- Use of Ubuntu 22.04 for the test runner
169161
- Use of Java 8
170-
- Use of Python 3.9.21 when testing / building the image
162+
- Use of Python 3.10.12 when testing / building the image
171163

172164
See the following resources for more information
173165
= https://docs.databricks.com/en/release-notes/runtime/15.4lts.html

Pipfile

Lines changed: 0 additions & 31 deletions
This file was deleted.

0 commit comments

Comments
 (0)