Skip to content

Commit c9c4e93

Browse files
committed
Update tests and documentation
1 parent d749476 commit c9c4e93

File tree

5 files changed

+60
-10
lines changed

5 files changed

+60
-10
lines changed

docs/source/generating_column_data.rst

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,3 +182,48 @@ This has several implications:
182182
SQL expression.
183183
To enforce the dependency, you must use the `baseColumn` attribute to indicate the dependency.
184184

185+
Creating data generation specs from files
186+
-----------------------------------------
187+
188+
``DataGenerator.fromFile("file_path")`` will return a ``DataGenerator`` with ``ColumnGenerationSpecs`` from definitions
189+
in a JSON or YAML file. Use the ``"generator"`` key to specify ``DataGenerator`` options and the ``"columns"`` key to
190+
specify ``ColumnGenerationSpec`` options.
191+
192+
**JSON Example:**
193+
194+
.. code-block:: JSON
195+
{
196+
"generator": {
197+
"name": "test_data_generator",
198+
"rows": 1000,
199+
"partitions": 10
200+
},
201+
"columns": [
202+
{"colName": "col1", "colType": "int", "minValue": 0, "maxValue": 100},
203+
{"colName": "col2", "colType": "float", "minValue": 0.0, "maxValue": 100.0},
204+
{"colName": "col3", "colType": "string", "values": ["a", "b", "c"], "random": true}
205+
]
206+
}
207+
208+
**YAML Example:**
209+
.. code-block:: YAML
210+
generator:
211+
name: test_data_generator
212+
rows: 1000
213+
partitions: 10
214+
columns:
215+
- colName: col1
216+
colType: int
217+
minValue: 0
218+
maxValue: 1000
219+
- colName: col2
220+
colType: float
221+
minValue: -10.0
222+
maxValue: 10.0
223+
- colName: col3
224+
colType: string
225+
values:
226+
- a
227+
- b
228+
- c
229+
random: true

docs/source/options_and_features.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,12 @@ representing the column - for example "email_0", "email_1" etc.
128128
If you specify the attribute ``structType="array"``, the multiple columns will be combined into a single array valued
129129
column.
130130

131+
Generating columns from Python dictionaries
132+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
133+
134+
You can generate columns from Python dictionaries using ``withColumns(column_options)``. Each dictionary should contain
135+
keys which match the ``withColumn`` arguments (e.g. ``"colName"``, ``"colType"``).
136+
131137
Generating random values
132138
^^^^^^^^^^^^^^^^^^^^^^^^
133139

tests/files/test_generator_spec.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
"random": true
99
},
1010
"columns": [
11-
{"colName": "col1", "colType": "int", "min": 0, "max": 100},
12-
{"colName": "col2", "colType": "float", "min": 0.0, "max": 100.0},
11+
{"colName": "col1", "colType": "int", "minValue": 0, "maxValue": 1000},
12+
{"colName": "col2", "colType": "float", "minValue": -10.0, "maxValue": 10.0},
1313
{"colName": "col3", "colType": "string", "values": ["a", "b", "c"], "random": true}
1414
]
1515
}

tests/files/test_generator_spec.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ generator:
88
columns:
99
- colName: col1
1010
colType: int
11-
min: 0
12-
max: 100
11+
minValue: 0
12+
maxValue: 1000
1313
- colName: col2
1414
colType: float
15-
min: 0.0
16-
max: 100.0
15+
minValue: -10.0
16+
maxValue: 10.0
1717
- colName: col3
1818
colType: string
1919
values:

tests/test_quick_tests.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
from datetime import timedelta, datetime
2-
3-
import pytest
42
import json
3+
import pytest
54
import yaml
65
from pyspark.sql.types import (
76
StructType, StructField, IntegerType, StringType, FloatType, DateType, DecimalType, DoubleType, ByteType,
@@ -784,7 +783,7 @@ def test_generation_from_dictionary(self):
784783

785784
def test_generation_from_file(self):
786785
path = "tests/files/test_generator_spec.json"
787-
with open(path, "r") as f:
786+
with open(path, "r", encoding="utf-8") as f:
788787
options = json.load(f)
789788
gen_options = options.get("generator")
790789
gen_from_json = DataGenerator.fromFile(path)
@@ -798,7 +797,7 @@ def test_generation_from_file(self):
798797
assert df_from_json.columns == ["col1", "col2", "col3"]
799798

800799
path = "tests/files/test_generator_spec.yml"
801-
with open(path, "r") as f:
800+
with open(path, "r", encoding="utf-8") as f:
802801
options = yaml.safe_load(f)
803802
gen_options = options.get("generator")
804803
gen_from_yaml = DataGenerator.fromFile(path)

0 commit comments

Comments
 (0)