Skip to content

Commit 577f4f4

Browse files
Merge branch 'master' into ronanstokes-db-patch-1
2 parents cba5f8e + 7275090 commit 577f4f4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+6095
-765
lines changed

.github/workflows/codeql-analysis.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ on:
1818
jobs:
1919
analyze:
2020
name: Analyze
21-
runs-on: ubuntu-latest
21+
runs-on: ubuntu-22.04
2222

2323
strategy:
2424
fail-fast: false
@@ -31,9 +31,9 @@ jobs:
3131

3232
# Initializes the CodeQL tools for scanning.
3333
- name: Initialize CodeQL
34-
uses: github/codeql-action/init@v2
34+
uses: github/codeql-action/init@v3
3535
with:
3636
languages: ${{ matrix.language }}
3737

3838
- name: Perform CodeQL Analysis
39-
uses: github/codeql-action/analyze@v2
39+
uses: github/codeql-action/analyze@v3

.github/workflows/push.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ on:
99
jobs:
1010
tests:
1111
# Ubuntu latest no longer installs Python 3.9 by default so install it
12-
runs-on: ubuntu-latest
12+
runs-on: ubuntu-22.04
1313
steps:
1414
- name: Checkout
1515
uses: actions/checkout@v4
@@ -26,6 +26,10 @@ jobs:
2626
# key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}
2727
# restore-keys: |
2828
# ${{ runner.os }}-go-
29+
- name: Set Java 8
30+
run: |
31+
sudo update-alternatives --set java /usr/lib/jvm/temurin-8-jdk-amd64/bin/java
32+
java -version
2933
3034
- name: Set up Python 3.8
3135
uses: actions/setup-python@v5
Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,23 @@ on:
77

88
jobs:
99
release:
10-
runs-on: ${{ matrix.os }}
11-
strategy:
12-
max-parallel: 1
13-
matrix:
14-
python-version: [ 3.8 ]
15-
os: [ ubuntu-latest ]
10+
runs-on: ubuntu-22.04
11+
environment: release
12+
permissions:
13+
# Used to authenticate to PyPI via OIDC and sign the release's artifacts with sigstore-python.
14+
id-token: write
15+
# Used to attach signing artifacts to the published release.
16+
contents: write
1617

1718
steps:
1819
- name: Checkout
1920
uses: actions/checkout@v4
2021

22+
- name: Set Java 8
23+
run: |
24+
sudo update-alternatives --set java /usr/lib/jvm/temurin-8-jdk-amd64/bin/java
25+
java -version
26+
2127
- name: Set up Python 3.8
2228
uses: actions/setup-python@v5
2329
with:
@@ -44,9 +50,3 @@ jobs:
4450

4551
- name: Publish a Python distribution to PyPI
4652
uses: pypa/gh-action-pypi-publish@release/v1
47-
with:
48-
user: __token__
49-
password: ${{ secrets.LABS_PYPI_TOKEN }}
50-
51-
52-

CHANGELOG.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,39 @@
1-
# Databricks Labs Data Generator Release Notes
1+
# Databricks Labs Synthetic Data Generator Release Notes
22

33
## Change History
44
All notable changes to the Databricks Labs Data Generator will be documented in this file.
55

6-
### Unreleased
6+
### unreleased
7+
8+
#### Fixed
9+
* Updated build scripts to use Ubuntu 22.04 to correspond to environment in Databricks runtime
10+
11+
### Version 0.4.0 Hotfix 2
12+
13+
#### Fixed
14+
* Added basic stock ticker and multi-table sales order standard datasets
15+
* Added min and max latitude and longitude options for the basic geometries dataset provider
16+
* Added default max values for numeric data types
17+
18+
### Version 0.4.0 Hotfix 1
19+
20+
#### Fixed
21+
* Fixed issue with running on serverless environment
22+
23+
24+
### Version 0.4.0
725

826
#### Changed
27+
* Updated minimum pyspark version to be 3.2.1, compatible with Databricks runtime 10.4 LTS or later
28+
* Modified data generator to allow specification of constraints to the data generation process
929
* Updated documentation for generating text data.
30+
* Modified data distribiutions to use abstract base classes
31+
* migrated data distribution tests to use `pytest`
32+
* Additional standard datasets
33+
34+
#### Added
35+
* Added classes for constraints on the data generation via new package `dbldatagen.constraints`
36+
* Added support for standard data sets via the new package `dbldatagen.datasets`
1037

1138

1239
### Version 0.3.6 Post 1
@@ -18,7 +45,6 @@ All notable changes to the Databricks Labs Data Generator will be documented in
1845
#### Fixed
1946
* Fixed scenario where `DataAnalyzer` is used on dataframe containing a column named `summary`
2047

21-
2248
### Version 0.3.6
2349

2450
#### Changed

CONTRIBUTING.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ runtime 9.1 LTS or later.
2626

2727
## Checking your code for common issues
2828

29-
Run `./lint.sh` from the project root directory to run various code style checks.
29+
Run `make dev-lint` from the project root directory to run various code style checks.
3030
These are based on the use of `prospector`, `pylint` and related tools.
3131

3232
## Setting up your build environment
@@ -45,6 +45,11 @@ Our recommended mechanism for building the code is to use a `conda` or `pipenv`
4545

4646
But it can be built with any Python virtualization environment.
4747

48+
### Spark dependencies
49+
The builds have been tested against Spark 3.2.1. This requires the OpenJDK 1.8.56 or later version of Java 8.
50+
The Databricks runtimes use the Azul Zulu version of OpenJDK 8 and we have used these in local testing.
51+
These are not installed automatically by the build process, so you will need to install them separately.
52+
4853
### Building with Conda
4954
To build with `conda`, perform the following commands:
5055
- `make create-dev-env` from the main project directory to create your conda environment, if using
@@ -70,7 +75,7 @@ To build with `pipenv`, perform the following commands:
7075
- Run `make dist` from the main project directory
7176
- The resulting wheel file will be placed in the `dist` subdirectory
7277

73-
The resulting build has been tested against Spark 3.0.1
78+
The resulting build has been tested against Spark 3.2.1
7479

7580
## Creating the HTML documentation
7681

@@ -153,3 +158,22 @@ Basically it follows the Python PEP8 coding conventions - but method and argumen
153158
with a lower case letter rather than underscores following Pyspark coding conventions.
154159

155160
See https://legacy.python.org/dev/peps/pep-0008/
161+
162+
# Github expectations
163+
When running the unit tests on Github, the environment should use the same environment as the latest Databricks
164+
runtime latest LTS release. While compatibility is preserved on LTS releases from Databricks runtime 10.4 onwards,
165+
unit tests will be run on the environment corresponding to the latest LTS release.
166+
167+
Libraries will use the same versions as the earliest supported LTS release - currently 10.4 LTS
168+
169+
This means for the current build:
170+
171+
- Use of Ubuntu 22.04 for the test runner
172+
- Use of Java 8
173+
- Use of Python 3.11
174+
175+
See the following resources for more information
176+
= https://docs.databricks.com/en/release-notes/runtime/15.4lts.html
177+
- https://docs.databricks.com/en/release-notes/runtime/10.4lts.html
178+
- https://github.com/actions/runner-images/issues/10636
179+

0 commit comments

Comments
 (0)