-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Environment Details
Please indicate the following details about the environment in which you found the bug:
- RDT version: 1.18.2
- Python version: 3.x
- Faker version: 40.1.0
Error Description
With the release of Faker v40.1.0, backward compatibility is broken for pickled or deserialized Faker instances used within RDT’s AnonymizedFaker before Faker 40.1.0.
Faker v40.1.0 introduced a change to its uniqueness logic that relies on a new internal attribute, _excluded_types. Faker instances created with versions prior to 40.1.0 do not have this attribute initialized. However, the updated Faker logic assumes its existence and attempts to access it, resulting in runtime errors.
This manifests in RDT when the anonymizer uses faker.unique (e.g., under unique, match, or scale cardinality rules), causing failures when older Faker instances are reused.
Steps to reproduce
- Create an instance of
AnonymizedFakerwith uniqueness, using Faker pre 40.1.0. - Fit the instance.
- Pickle the instance.
- Upgrade Faker to 40.1.0
- Unpickle the instance.
- Try to
reverse_transform.
Proposed Approach
A fix can be applied in rdt/transformers/pii/anonymizer.py by ensuring the _excluded_types attribute is initialized when missing. The following change resolves the issue by explicitly setting the attribute on the faker.unique proxy:
def _function(self):
"""Return the result of calling the ``faker`` function."""
try:
if self.cardinality_rule in {'unique', 'match', 'scale'}:
faker_attr = self.faker.unique
if not hasattr(faker_attr, '_excluded_types'):
setattr(faker_attr, '_excluded_types', ())
else:
faker_attr = self.faker
except AttributeError:
faker_attr = self.faker.unique if self.enforce_uniqueness else self.faker
result = getattr(faker_attr, self.function_name)
result = result()
if isinstance(result, Iterable) and not isinstance(result, str):
result = ', '.join(map(str, result))
return result