Skip to content

Faker 40.1.0 Breaks Backward Compatibility in AnonymizedFaker #1057

@pvk-developer

Description

@pvk-developer

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • RDT version: 1.18.2
  • Python version: 3.x
  • Faker version: 40.1.0

Error Description

With the release of Faker v40.1.0, backward compatibility is broken for pickled or deserialized Faker instances used within RDT’s AnonymizedFaker before Faker 40.1.0.

Faker v40.1.0 introduced a change to its uniqueness logic that relies on a new internal attribute, _excluded_types. Faker instances created with versions prior to 40.1.0 do not have this attribute initialized. However, the updated Faker logic assumes its existence and attempts to access it, resulting in runtime errors.

This manifests in RDT when the anonymizer uses faker.unique (e.g., under unique, match, or scale cardinality rules), causing failures when older Faker instances are reused.

Steps to reproduce

  1. Create an instance of AnonymizedFaker with uniqueness, using Faker pre 40.1.0.
  2. Fit the instance.
  3. Pickle the instance.
  4. Upgrade Faker to 40.1.0
  5. Unpickle the instance.
  6. Try to reverse_transform.

Proposed Approach

A fix can be applied in rdt/transformers/pii/anonymizer.py by ensuring the _excluded_types attribute is initialized when missing. The following change resolves the issue by explicitly setting the attribute on the faker.unique proxy:

def _function(self):
    """Return the result of calling the ``faker`` function."""
    try:
        if self.cardinality_rule in {'unique', 'match', 'scale'}:
            faker_attr = self.faker.unique
            if not hasattr(faker_attr, '_excluded_types'):
                setattr(faker_attr, '_excluded_types', ())
        else:
            faker_attr = self.faker

    except AttributeError:
        faker_attr = self.faker.unique if self.enforce_uniqueness else self.faker

    result = getattr(faker_attr, self.function_name)
    result = result()
    if isinstance(result, Iterable) and not isinstance(result, str):
        result = ', '.join(map(str, result))

    return result

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinginternalThe issue doesn't change the API or functionality

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions