Skip to content

Commit 7746ab3

Browse files
committed
Updated README
1 parent 4f963b3 commit 7746ab3

File tree

1 file changed

+5
-4
lines changed
  • src/midst_toolkit/attacks/black_box_single_table/ensemble_mia/data_processing

1 file changed

+5
-4
lines changed
Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
The data processing pipeline is right now specific to the MIDST competition provided
22
resource folders.
33

4-
Step 1: Collect all the train data from all the attack types (every train folder provided by the challenge). "population_all.csv" includes 867494 data points.
4+
Step 1: Collect all the train data from all the attack types (every train folder provided by the challenge). "population_all.csv" includes a total of `867494` data points.
55

6-
Step 2: Collect all the challenge data points from train, dev and final folders of tabddpm_black_box. `challenge_points_all.csv` includes 13896 data points.
6+
Step 2: Collect all the challenge data points from `train`, `dev` and `final` folders of `tabddpm_black_box`. `challenge_points_all.csv` includes `13896` data points.
77

88
Step 3: Save population data without and with challenge points. `population_all_no_challenge.csv` includes 855644 data points, and `population_all_with_challenge.csv` includes 869540 data points.
99

10-
`population_all_with_challenge.csv` is used to create real train and test data (referred to as `Population/Subset (Real Data)` in the Figure). Note that a random subset of 40k data points are sampled from `population_all_with_challenge.csv` and used as population (or real data).
10+
`population_all_with_challenge.csv` is used to create real train and test data. Note that a random subset of `40k` data points are sampled from `population_all_with_challenge.csv` and used as population (or real data).
1111

12-
To run the whole data processing pipeline run `process_split_data.py`. It reads data from `ensemble_mia/data/midst_data_all_attacks`, and populates `ensemble_mia/data/population_data` and `ensemble_mia/data/attack_data` folders.
12+
To run the whole data processing pipeline, run `process_split_data.py`. It reads data from `ensemble_mia/data/midst_data_all_attacks`, and populates `ensemble_mia/data/population_data` and `ensemble_mia/data/attack_data` folders.
13+
Don't forger to edit the data config dictionary in `/midst_toolkit/attacks/black_box_single_table/ensemble_mia/config.py`.

0 commit comments

Comments
 (0)