Skip to content

Commit d3825e8

Browse files
Update metrics related content
1 parent 97e4e41 commit d3825e8

File tree

7 files changed

+26
-40
lines changed

7 files changed

+26
-40
lines changed

README.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -65,13 +65,13 @@ DD-Ranking (DD, *i.e.*, Dataset Distillation) is an integrated and easy-to-use b
6565

6666
<!-- Hard label is tested -->
6767
<!-- Keep the same compression ratio, comparing with random selection -->
68-
**Performance benchmark**
68+
### Benchmark
6969

7070
Revisit the original goal of dataset distillation:
71-
> The idea is to synthesize a small number of data points that do not need to come from the correct data distribution, but will, when given to the learning algorithm as training data, approximate the model trained on the original data.
71+
> The idea is to synthesize a small number of data points that do not need to come from the correct data distribution, but will, when given to the learning algorithm as training data, approximate the model trained on the original data. (Wang et al., 2020)
7272
>
7373
74-
The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects:
74+
The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the informativeness of the synthesized data by assessing the following two aspects:
7575
1. The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR}=\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}}$.
7676

7777
2. The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$.
@@ -81,9 +81,13 @@ $\text{Acc.}$ is the accuracy of models trained on different samples. Samples' m
8181
- $\text{syn-any}$: Synthetic dataset with personalized evaluation methods (hard or soft labels);
8282
- $\text{rdm-any}$: Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods.
8383

84-
To rank different methods, we combine the above two metrics as follows:
84+
<!-- To rank different methods, we combine the above two metrics as follows:
8585
86-
$$\text{IOR}/\text{HLR} = \frac{(\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}})}{(\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}})}$$
86+
$$\text{IOR}/\text{HLR} = \frac{(\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}})}{(\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}})}$$ -->
87+
88+
</details>
89+
90+
## Overview
8791

8892
DD-Ranking is integrated with:
8993
<!-- Uniform Fair Labels: loss on soft label -->
@@ -98,18 +102,15 @@ DD-Ranking has the following features:
98102
- **Extensible**: DD-Ranking supports various datasets and models.
99103
- **Customizable**: DD-Ranking supports various data augmentations and soft label strategies.
100104

101-
</details>
102-
103-
## Overview
104-
Included datasets and methods (categorized by hard/soft label).
105+
DD-Ranking currently includes the following datasets and methods (categorized by hard/soft label). Evaluation results can be found in the [leaderboard](https://huggingface.co/spaces/Soptq/DD-Ranking).
105106
|Supported Dataset|Evaluated Hard Label Methods|Evaluated Soft Label Methods|
106107
|:-|:-|:-|
107108
|CIFAR-10|DC|DATM|
108109
|CIFAR-100|DSA|SRe2L|
109110
|TinyImageNet|DM|RDED|
110111
||MTT|D4M|
111112

112-
Evaluation results can be found in the [leaderboard](https://huggingface.co/spaces/Soptq/DD-Ranking).
113+
113114

114115
## Tutorial
115116

@@ -221,6 +222,7 @@ The following results will be returned to you:
221222
## Coming Soon
222223
- [ ] DD-Ranking scores that decouple the impacts from data augmentation.
223224
- [ ] Evaluation results on ImageNet subsets.
225+
- [ ] More baseline methods.
224226

225227
## Contributing
226228

dd_ranking/metrics/hard_label.py

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,6 @@ def compute_metrics(self, image_tensor: Tensor=None, image_path: str=None, hard_
186186
if not hard_labels:
187187
hard_labels = torch.tensor(np.array([np.ones(self.ipc) * i for i in range(self.num_classes)]), dtype=torch.long, requires_grad=False).view(-1)
188188

189-
dd_ranking_scores = []
190189
hard_label_recovery = []
191190
improvement_over_random = []
192191
for i in range(self.num_eval):
@@ -256,30 +255,23 @@ def compute_metrics(self, image_tensor: Tensor=None, image_path: str=None, hard_
256255

257256
hard_label_recovery.append(hlr)
258257
improvement_over_random.append(ior)
259-
dd_ranking_scores.append(ior / hlr)
260258

261259
results_to_save = {
262260
"hard_label_recovery": hard_label_recovery,
263-
"improvement_over_random": improvement_over_random,
264-
"dd_ranking_score": dd_ranking_scores
261+
"improvement_over_random": improvement_over_random
265262
}
266263
save_results(results_to_save, self.save_path)
267264

268265
hard_label_recovery_mean = np.mean(hard_label_recovery)
269266
hard_label_recovery_std = np.std(hard_label_recovery)
270267
improvement_over_random_mean = np.mean(improvement_over_random)
271268
improvement_over_random_std = np.std(improvement_over_random)
272-
dd_ranking_score_mean = np.mean(dd_ranking_scores)
273-
dd_ranking_score_std = np.std(dd_ranking_scores)
274269

275270
print(f"Hard Label Recovery Mean: {hard_label_recovery_mean:.2f}% Std: {hard_label_recovery_std:.2f}")
276271
print(f"Improvement Over Random Mean: {improvement_over_random_mean:.2f}% Std: {improvement_over_random_std:.2f}")
277-
print(f"DD-Ranking Score Mean: {dd_ranking_score_mean:.2f} Std: {dd_ranking_score_std:.2f}")
278272
return {
279273
"hard_label_recovery_mean": hard_label_recovery_mean,
280274
"hard_label_recovery_std": hard_label_recovery_std,
281275
"improvement_over_random_mean": improvement_over_random_mean,
282-
"improvement_over_random_std": improvement_over_random_std,
283-
"dd_ranking_score_mean": dd_ranking_score_mean,
284-
"dd_ranking_score_std": dd_ranking_score_std
276+
"improvement_over_random_std": improvement_over_random_std
285277
}

dd_ranking/metrics/soft_label.py

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -298,7 +298,7 @@ def compute_metrics(self, image_tensor: Tensor=None, image_path: str=None, soft_
298298

299299
hard_labels = torch.tensor(np.array([np.ones(self.ipc) * i for i in range(self.num_classes)]),
300300
dtype=torch.long, requires_grad=False).view(-1)
301-
dd_ranking_scores = []
301+
302302
hard_label_recovery = []
303303
improvement_over_random = []
304304
for i in range(self.num_eval):
@@ -378,32 +378,25 @@ def compute_metrics(self, image_tensor: Tensor=None, image_path: str=None, soft_
378378

379379
hard_label_recovery.append(hlr)
380380
improvement_over_random.append(ior)
381-
dd_ranking_scores.append(ior / hlr)
382381

383382
results_to_save = {
384383
"hard_label_recovery": hard_label_recovery,
385-
"improvement_over_random": improvement_over_random,
386-
"dd_ranking_score": dd_ranking_scores
384+
"improvement_over_random": improvement_over_random
387385
}
388386
save_results(results_to_save, self.save_path)
389387

390388
hard_label_recovery_mean = np.mean(hard_label_recovery)
391389
hard_label_recovery_std = np.std(hard_label_recovery)
392390
improvement_over_random_mean = np.mean(improvement_over_random)
393391
improvement_over_random_std = np.std(improvement_over_random)
394-
dd_ranking_score_mean = np.mean(dd_ranking_scores)
395-
dd_ranking_score_std = np.std(dd_ranking_scores)
396392

397393
print(f"Hard Label Recovery Mean: {hard_label_recovery_mean:.2f}% Std: {hard_label_recovery_std:.2f}")
398394
print(f"Improvement Over Random Mean: {improvement_over_random_mean:.2f}% Std: {improvement_over_random_std:.2f}")
399-
print(f"DD-Ranking Score Mean: {dd_ranking_score_mean:.2f} Std: {dd_ranking_score_std:.2f}")
400395
return {
401396
"hard_label_recovery_mean": hard_label_recovery_mean,
402397
"hard_label_recovery_std": hard_label_recovery_std,
403398
"improvement_over_random_mean": improvement_over_random_mean,
404-
"improvement_over_random_std": improvement_over_random_std,
405-
"dd_ranking_score_mean": dd_ranking_score_mean,
406-
"dd_ranking_score_std": dd_ranking_score_std
399+
"improvement_over_random_std": improvement_over_random_std
407400
}
408401

409402

doc/getting-started/quick-start.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,5 +79,5 @@ The following results will be returned to you:
7979
- `hard_label_recovery std`: The standard deviation of hard label recovery scores.
8080
- `improvement_over_random mean`: The mean of improvement over random scores.
8181
- `improvement_over_random std`: The standard deviation of improvement over random scores.
82-
- `dd_ranking_score mean`: The mean of dd ranking scores.
83-
- `dd_ranking_score std`: The standard deviation of dd ranking scores.
82+
<!-- - `dd_ranking_score mean`: The mean of dd ranking scores.
83+
- `dd_ranking_score std`: The standard deviation of dd ranking scores. -->

doc/introduction.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ The evaluation method for DD-Ranking is grounded in the essence of dataset disti
5656
- \\(\text{syn-any}\\): Synthetic dataset with personalized evaluation methods (hard or soft labels);
5757
- \\(\text{rdm-any}\\): Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods.
5858

59-
To rank different methods, we combine the above two metrics as DD-Ranking Score:
59+
<!-- To rank different methods, we combine the above two metrics as DD-Ranking Score:
6060
61-
\\[\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc.} \text{syn-any}-\text{Acc.} \text{rdm-any})}{(\text{Acc.} \text{full-hard}-\text{Acc.} \text{syn-hard})}\\]
61+
\\[\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc.} \text{syn-any}-\text{Acc.} \text{rdm-any})}{(\text{Acc.} \text{full-hard}-\text{Acc.} \text{syn-hard})}\\] -->
6262

doc/metrics/hard-label.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ This method computes the HLR, IOR, and DD-Ranking scores for the given image and
7171
1. Compute the test accuracy of the surrogate model on the synthetic dataset under hard labels. We tune the learning rate for the best performance if `syn_lr` is not provided.
7272
2. Compute the test accuracy of the surrogate model on the real dataset under the same setting as step 1.
7373
3. Compute the test accuracy of the surrogate model on the randomly selected dataset under the same setting as step 1.
74-
4. Compute the HLR, IOR, and DD-Ranking scores.
74+
4. Compute the HLR and IOR scores.
7575

7676
The final scores are the average of the scores from `num_eval` rounds.
7777

@@ -90,8 +90,6 @@ A dictionary with the following keys:
9090
- **hard_label_recovery_std**: Standard deviation of HLR scores from `num_eval` rounds.
9191
- **improvement_over_random_mean**: Mean of improvement over random scores from `num_eval` rounds.
9292
- **improvement_over_random_std**: Standard deviation of improvement over random scores from `num_eval` rounds.
93-
- **dd_ranking_mean**: Mean of DD-Ranking scores from `num_eval` rounds.
94-
- **dd_ranking_std**: Standard deviation of DD-Ranking scores from `num_eval` rounds.
9593

9694
**Examples:**
9795

doc/metrics/soft-label.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ A class for evaluating the performance of a dataset distillation method with sof
5050
- **temperature**(<span style="color:#FF6B00;">float</span>): Temperature for knowledge distillation.
5151
- **data_aug_func**(<span style="color:#FF6B00;">str</span>): Data augmentation function used during training. Currently supports `dsa`, `cutmix`, `mixup`. See [augmentations](../augmentations/overview.md) for more details.
5252
- **aug_params**(<span style="color:#FF6B00;">dict</span>): Parameters for the data augmentation function.
53+
- **use_aug_for_hard**(<span style="color:#FF6B00;">bool</span>): Whether to use the data augmentation specified in `data_aug_func` for hard label evaluation.
5354
- **optimizer**(<span style="color:#FF6B00;">str</span>): Name of the optimizer. Currently supports torch-based optimizers - `sgd`, `adam`, and `adamw`.
5455
- **lr_scheduler**(<span style="color:#FF6B00;">str</span>): Name of the learning rate scheduler. Currently supports torch-based schedulers - `step`, `cosine`, `lambda_step`, and `lambda_cos`.
5556
- **weight_decay**(<span style="color:#FF6B00;">float</span>): Weight decay for the optimizer.
@@ -83,7 +84,7 @@ This method computes the HLR, IOR, and DD-Ranking scores for the given image and
8384
2. Compute the test accuracy of the surrogate model on the real dataset under the same setting as step 1.
8485
3. Compute the test accuracy of the surrogate model on the synthetic dataset under soft labels.
8586
4. Compute the test accuracy of the surrogate model on the randomly selected dataset under the same setting as step 3.
86-
5. Compute the HLR, IOR, and DD-Ranking scores.
87+
5. Compute the HLR and IOR scores.
8788

8889
The final scores are the average of the scores from `num_eval` rounds.
8990

@@ -102,8 +103,8 @@ A dictionary with the following keys:
102103
- **hard_label_recovery_std**: Standard deviation of HLR scores from `num_eval` rounds.
103104
- **improvement_over_random_mean**: Mean of improvement over random scores from `num_eval` rounds.
104105
- **improvement_over_random_std**: Standard deviation of improvement over random scores from `num_eval` rounds.
105-
- **dd_ranking_mean**: Mean of DD-Ranking scores from `num_eval` rounds.
106-
- **dd_ranking_std**: Standard deviation of DD-Ranking scores from `num_eval` rounds.
106+
<!-- - **dd_ranking_mean**: Mean of DD-Ranking scores from `num_eval` rounds.
107+
- **dd_ranking_std**: Standard deviation of DD-Ranking scores from `num_eval` rounds. -->
107108

108109
</div>
109110

0 commit comments

Comments
 (0)