-
Notifications
You must be signed in to change notification settings - Fork 15
Description
To illustrate this problem, consider generating 15 events with the folllowing configuration:
fields:
- name: a
range:
min: 0
max: 50
cardinality: 5
- name: b
range:
min: 0
max: 100
cardinality: 5
a
is a number between 0-50 and in the generated events there are 5 unique values of a
.
b
is a number between 0-100 and in the generated events there are 5 unique values of b
.
In this configuration there is no explicit coupling between fields a
and b
. However when this is run, the output is as follows:
{
"a-b": 10-51
}
{
"a-b": 21-37
}
{
"a-b": 20-58
}
{
"a-b": 48-16
}
{
"a-b": 49-84
}
{
"a-b": 10-51
}
{
"a-b": 21-37
}
{
"a-b": 20-58
}
{
"a-b": 48-16
}
{
"a-b": 49-84
}
{
"a-b": 10-51
}
{
"a-b": 21-37
}
{
"a-b": 20-58
}
{
"a-b": 48-16
}
{
"a-b": 49-84
}
Notice how there are 5 unique documents here, repeated 3 times. The fact that the fields have the same cardinality causes them to be coupled.
This behaviour is confusing, and can cause unwanted repetition in the generated data.
The correct behaviour can be observed with enum
types, which also have well-defined cardinality (the number of enum values).
fields:
- name: region
enum: ['NASA', 'APAC', 'EMEA']
- name: team
enum: ['A', 'B', 'C']
Note that in this configuration both fields have a cardinality of 3. In the generated data there is no coupling. Here are 9 generated data points:
{
"sales-team": EMEA-A
}
{
"sales-team": EMEA-C
}
{
"sales-team": APAC-A
}
{
"sales-team": APAC-C
}
{
"sales-team": APAC-A
}
{
"sales-team": EMEA-B
}
{
"sales-team": NASA-C
}
{
"sales-team": APAC-C
}
{
"sales-team": NASA-C
}
Another strange behaviour is that if I explicitly write the cardinality values in for these fields (3 and 3 respectively), one would expect it to have no effect, since the cardinality is already 3, but doing this causes only 3 unique values of sales-team
in the output, repeated over and over.
fields:
- name: region
enum: ['NASA', 'APAC', 'EMEA']
cardinality: 3
- name: team
enum: ['A', 'B', 'C']
cardinality: 3
->
{
"sales-team": EMEA-A
}
{
"sales-team": APAC-B
}
{
"sales-team": NASA-C
}
{
"sales-team": EMEA-A
}
{
"sales-team": APAC-B
}
{
"sales-team": NASA-C
}
{
"sales-team": EMEA-A
}
{
"sales-team": APAC-B
}
{
"sales-team": NASA-C
}
This implicit coupling of values with the same cardinality should be removed, and replaced with a more explicit way to enable coupling between values (which is often required).