Skip to content

Commit 7282f8a

Browse files
committed
feature: file_column anonymizer, inject multi-column samples in database from a csv file
1 parent 3d36dda commit 7282f8a

File tree

15 files changed

+364
-14
lines changed

15 files changed

+364
-14
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
## Next
44

5+
* [feature] 🌟 File multi-column anonymizer, inject sample rows in database from a CSV file.
56
* [feature] 🌟 File enum anonymizer, inject samples in database from a plain text or CSV file.
67
* [feature] 🌟 String pattern anonymizer, build complex strings by fetching values from other anonymizers.
78

docs/content/anonymization/core-anonymizers.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ This page list all *Anonymizers* provided by *DbToolsBundle*.
1515
<!--@include: ./core-anonymizers/string.md-->
1616
<!--@include: ./core-anonymizers/pattern.md-->
1717
<!--@include: ./core-anonymizers/file-enum.md-->
18+
<!--@include: ./core-anonymizers/file-column.md-->
1819
<!--@include: ./core-anonymizers/lastname.md-->
1920
<!--@include: ./core-anonymizers/firstname.md-->
2021
<!--@include: ./core-anonymizers/lorem-ipsum.md-->

docs/content/anonymization/core-anonymizers/address.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,13 @@ customer:
9595
#...
9696
```
9797
:::
98+
99+
:::warning
100+
This anonymizer works at the *table level* which means that the PHP attribute
101+
cannot target object properties: you must specify table column names and not
102+
PHP class property names.
103+
:::
104+
98105
@@@
99106

100107
:::tip
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
## File multiple column
2+
3+
This Anonymizer will anonymize multiple columns at once using value rows from a
4+
input file. As of now, only CSV files are supported.
5+
6+
This aninymizer behaves like any other multiple column anonymizer and allows you
7+
to arbitrarily map any sample column into any database table column using the
8+
anonymizer options.
9+
10+
Given the following file:
11+
12+
```txt
13+
Number,Foo,Animal
14+
1,foo,cat
15+
2,bar,dog
16+
3,baz,girafe
17+
```
18+
19+
Then:
20+
21+
@@@ standalone docker
22+
23+
```yaml [YAML]
24+
# db_tools.config.yaml
25+
anonymization:
26+
default:
27+
customer:
28+
my_data:
29+
anonymizer: file_column
30+
options:
31+
source: ./resources/my_data.csv
32+
# Define your CSV file column names.
33+
columns: [number, foo, animal]
34+
# Other allowed options.
35+
file_skip_header: true
36+
# Now your columns, keys are CSV column names
37+
# you set upper, values are your database column
38+
# names.
39+
number: my_integer_column
40+
foo: my_foo_column
41+
animal: my_animal_column
42+
#...
43+
```
44+
45+
@@@
46+
@@@ symfony
47+
48+
::: code-group
49+
```php [Attribute]
50+
namespace App\Entity;
51+
52+
use Doctrine\ORM\Mapping as ORM;
53+
use MakinaCorpus\DbToolsBundle\Attribute\Anonymize;
54+
55+
#[ORM\Entity()]
56+
#[ORM\Table(name: 'customer')]
57+
#[Anonymize(type: 'string', options: [ // [!code ++]
58+
'source' => './resources/my_data.csv', // [!code ++]
59+
// Define your CSV file column names. // [!code ++]
60+
'columns': ['number', 'foo', 'animal'], // [!code ++]
61+
// Other allowed options. // [!code ++]
62+
'file_skip_header' => true, // [!code ++]
63+
// Now your columns, keys are CSV column names // [!code ++]
64+
// you set upper, values are your database column // [!code ++]
65+
// names. // [!code ++]
66+
'number' => 'my_integer_column', // [!code ++]
67+
'foo' => 'my_foo_column', // [!code ++]
68+
'animal' => 'my_animal_column', // [!code ++]
69+
])] // [!code ++]
70+
class Customer
71+
{
72+
// ...
73+
74+
#[ORM\Column(length: 255)]
75+
private ?string $myNumber = null;
76+
77+
#[ORM\Column(length: 255)]
78+
private ?string $myFoo = null;
79+
80+
#[ORM\Column(length: 255)]
81+
private ?string $myAnimal = null;
82+
83+
// ...
84+
}
85+
```
86+
87+
```yaml [YAML]
88+
# config/anonymization.yaml
89+
customer:
90+
my_data:
91+
anonymizer: file_column
92+
options:
93+
source: ./resources/my_data.csv
94+
# Define your CSV file column names.
95+
columns: [number, foo, animal]
96+
# Other allowed options.
97+
file_skip_header: true
98+
# Now your columns, keys are CSV column names
99+
# you set upper, values are your database column
100+
# names.
101+
number: my_integer_column
102+
foo: my_foo_column
103+
animal: my_animal_column
104+
#...
105+
```
106+
:::
107+
108+
:::warning
109+
This anonymizer works at the *table level* which means that the PHP attribute
110+
cannot target object properties: you must specify table column names and not
111+
PHP class property names.
112+
:::
113+
114+
@@@
115+
116+
When parsing a file file, you can set the following options as well:
117+
- `file_csv_enclosure`: if file is a CSV, use this as the enclosure character (default is `'"'`).
118+
- `file_csv_escape`: if file is a CSV, use this as the escape character (default is `'\\'`).
119+
- `file_csv_separator`: if file is a CSV, use this as the separator character (default is `','`).
120+
- `file_skip_header`: when reading any file, set this to true to skip the first line (default is `false`).
121+
122+
:::tip
123+
The filename can be absolute, or relative. For relative file resolution
124+
please see [*File name resolution*](#file-name-resolution)
125+
:::

docs/content/anonymization/core-anonymizers/file-enum.md

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
## File enum
22

3-
This *Anonymizer* will fill configured column with a random value from a given sample fetched
3+
This anonymizer will fill configured column with a random value from a given sample fetched
44
from a plain text or a CSV file.
55

66
Given the following file:
@@ -73,12 +73,7 @@ When parsing a file file, you can set the following options as well:
7373
- `file_csv_separator`: if file is a CSV, use this as the separator character (default is `','`).
7474
- `file_skip_header`: when reading any file, set this to true to skip the first line (default is `false`).
7575

76-
:::warning
77-
The filename can be absolute, or relative. When relative, it will be relative
78-
to the current PHP working directory.
79-
80-
Working with the PHP working directory is experimental.
81-
This might cause trouble depending upon your execution environment.
82-
83-
Future versions will allow a better directory selection.
76+
:::tip
77+
The filename can be absolute, or relative. For relative file resolution
78+
please see [*File name resolution*](#file-name-resolution)
8479
:::

docs/content/anonymization/core-anonymizers/iban-bic.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,4 +74,10 @@ customer:
7474
```
7575
:::
7676

77+
:::warning
78+
This anonymizer works at the *table level* which means that the PHP attribute
79+
cannot target object properties: you must specify table column names and not
80+
PHP class property names.
81+
:::
82+
7783
@@@

src/Anonymization/Anonymizator.php

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,12 +43,21 @@ class Anonymizator implements LoggerAwareInterface
4343

4444
private OutputInterface $output;
4545

46+
private string $basePath;
47+
4648
public function __construct(
4749
private DatabaseSession $databaseSession,
4850
private AnonymizerRegistry $anonymizerRegistry,
4951
private AnonymizationConfig $anonymizationConfig,
5052
private ?string $salt = null,
53+
/**
54+
* @todo
55+
* This is not the right place to set this, but any other alternative
56+
* would require a deep refactor of anonymizer options.
57+
*/
58+
?string $basePath = null,
5159
) {
60+
$this->basePath = $basePath ?? \getcwd();
5261
$this->logger = new NullLogger();
5362
$this->output = new NullOutput();
5463
}
@@ -89,7 +98,7 @@ protected function createAnonymizer(AnonymizerConfig $config): AbstractAnonymize
8998
return $this->anonymizerRegistry->createAnonymizer(
9099
$config->anonymizer,
91100
$config,
92-
$config->options->with(['salt' => $this->getSalt()]),
101+
$config->options->with(['salt' => $this->getSalt(), 'base_path' => $this->basePath]),
93102
$this->databaseSession
94103
);
95104
}

src/Anonymization/AnonymizatorFactory.php

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,24 @@ public function __construct(
2121
private DatabaseSessionRegistry $registry,
2222
private AnonymizerRegistry $anonymizerRegistry,
2323
private ?LoggerInterface $logger = null,
24+
/**
25+
* @todo
26+
* This is not the right place to set this, but any other alternative
27+
* would require a deep refactor of anonymizer options.
28+
*/
29+
private ?string $basePath = null,
2430
) {}
2531

32+
/**
33+
* @internal
34+
* For Laravel dependency injection only.
35+
* This can change anytime.
36+
*/
37+
public function setBasePath(?string $basePath): void
38+
{
39+
$this->basePath = $basePath;
40+
}
41+
2642
/**
2743
* Add configuration loader.
2844
*/
@@ -49,7 +65,8 @@ public function getOrCreate(string $connectionName): Anonymizator
4965
$anonymizator = new Anonymizator(
5066
$this->registry->getDatabaseSession($connectionName),
5167
$this->anonymizerRegistry,
52-
$config
68+
$config,
69+
$this->basePath,
5370
);
5471

5572
if ($this->logger) {

src/Anonymization/Anonymizer/AbstractMultipleColumnAnonymizer.php

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,13 +58,13 @@ protected function validateOptions(): void
5858

5959
// We only validate column options here.
6060
// Other ones will be validated by each implementation.
61-
$options = \array_filter(
61+
$columnOptions = \array_filter(
6262
$this->options->all(),
6363
fn ($key) => \in_array($key, $this->getColumnNames()),
6464
ARRAY_FILTER_USE_KEY
6565
);
6666

67-
if (\count(\array_unique($options)) < \count($options)) {
67+
if (\count(\array_unique($columnOptions)) < \count($columnOptions)) {
6868
throw new \InvalidArgumentException("The same column has been mapped twice.");
6969
}
7070
}

src/Anonymization/Anonymizer/AnonymizerRegistry.php

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ class AnonymizerRegistry
1818
Core\DateAnonymizer::class,
1919
Core\EmailAnonymizer::class,
2020
Core\FileEnumAnonymizer::class,
21+
Core\FileMultipleColumnAnonymizer::class,
2122
Core\FirstNameAnonymizer::class,
2223
Core\FloatAnonymizer::class,
2324
Core\IbanBicAnonymizer::class,

0 commit comments

Comments
 (0)