Skip to content

remove coupling between fields with the same cardinality #128

@tommyers-elastic

Description

@tommyers-elastic

To illustrate this problem, consider generating 15 events with the folllowing configuration:

fields:
  - name: a
    range:
      min: 0
      max: 50
    cardinality: 5
  - name: b
    range:
      min: 0
      max: 100
    cardinality: 5  

a is a number between 0-50 and in the generated events there are 5 unique values of a.
b is a number between 0-100 and in the generated events there are 5 unique values of b.

In this configuration there is no explicit coupling between fields a and b. However when this is run, the output is as follows:


{
    "a-b": 10-51
}

{
    "a-b": 21-37
}

{
    "a-b": 20-58
}

{
    "a-b": 48-16
}

{
    "a-b": 49-84
}

{
    "a-b": 10-51
}

{
    "a-b": 21-37
}

{
    "a-b": 20-58
}

{
    "a-b": 48-16
}

{
    "a-b": 49-84
}

{
    "a-b": 10-51
}

{
    "a-b": 21-37
}

{
    "a-b": 20-58
}

{
    "a-b": 48-16
}

{
    "a-b": 49-84
}

Notice how there are 5 unique documents here, repeated 3 times. The fact that the fields have the same cardinality causes them to be coupled.

This behaviour is confusing, and can cause unwanted repetition in the generated data.

The correct behaviour can be observed with enum types, which also have well-defined cardinality (the number of enum values).

fields:
  - name: region
    enum: ['NASA', 'APAC', 'EMEA']
  - name: team
    enum: ['A', 'B', 'C']

Note that in this configuration both fields have a cardinality of 3. In the generated data there is no coupling. Here are 9 generated data points:


{
    "sales-team": EMEA-A
}

{
    "sales-team": EMEA-C
}

{
    "sales-team": APAC-A
}

{
    "sales-team": APAC-C
}

{
    "sales-team": APAC-A
}

{
    "sales-team": EMEA-B
}

{
    "sales-team": NASA-C
}

{
    "sales-team": APAC-C
}

{
    "sales-team": NASA-C
}

Another strange behaviour is that if I explicitly write the cardinality values in for these fields (3 and 3 respectively), one would expect it to have no effect, since the cardinality is already 3, but doing this causes only 3 unique values of sales-team in the output, repeated over and over.

fields:
  - name: region
    enum: ['NASA', 'APAC', 'EMEA']
    cardinality: 3
  - name: team
    enum: ['A', 'B', 'C']
    cardinality: 3

->

{
    "sales-team": EMEA-A
}

{
    "sales-team": APAC-B
}

{
    "sales-team": NASA-C
}

{
    "sales-team": EMEA-A
}

{
    "sales-team": APAC-B
}

{
    "sales-team": NASA-C
}

{
    "sales-team": EMEA-A
}

{
    "sales-team": APAC-B
}

{
    "sales-team": NASA-C
}

This implicit coupling of values with the same cardinality should be removed, and replaced with a more explicit way to enable coupling between values (which is often required).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions