Skip to content

If first 5 values of columns are the same, the seed is the same. Should it be different? #916

@npatki

Description

@npatki

Problem Description

RDT is supposed to be setting the random seed based on the first 5 values in the column. For a lot of cases, this is sufficiently allows users to:
(a) transform/reverse transform in a reproducible way while also
(b) creating different data for each different column

However, this would break down if you have 2 different columns and the first 5 values of those columns are exactly the same. In such a case, the random seed for those columns would be exactly the same, so whatever randomness they have will be in-sync.

This is a general issue that we can use to track the problem. If it ends up affecting a lot of users, we may want to consider finding a different way to set the seed.

Additional context

This issue assumes that #906 has already been fixed. (#906 identifies an bug where the first five column names are being used rather than the first 5 data values.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions