Skip to content

Commit 434a861

Browse files
Merge pull request #10 from NUS-HPC-AI-Lab/dist
Merge v0.2.0
2 parents 7dadc74 + 56d174d commit 434a861

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+4182
-1357
lines changed

README.md

Lines changed: 58 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
Fair and benchmark for dataset distillation.
1414
</h3> -->
1515
<p align="center">
16-
| <a href="https://nus-hpc-ai-lab.github.io/DD-Ranking/"><b>Documentation</b></a> | <a href="https://huggingface.co/spaces/logits/DD-Ranking"><b>Leaderboard</b></a> | <a href="https://arxiv.org/abs/2505.13300"><b>Paper</b></a> | <a href="https://x.com/Richard91316073/status/1890296645486801230"><b>Twitter/X</b></a> | <a href="https://join.slack.com/t/dd-ranking/shared_invite/zt-2xlcuq1mf-hmVcfrtqrIB3qXRjwgB03A"><b>Developer Slack</b></a> |
16+
| <a href="https://nus-hpc-ai-lab.github.io/DD-Ranking/"><b>Documentation</b></a> | <a href="https://huggingface.co/spaces/logits/DD-Ranking"><b>Leaderboard</b></a> | <a href="https://arxiv.org/abs/2505.13300"><b>Paper</b> </a> | <a href="https://x.com/Richard91316073/status/1890296645486801230"><b>Twitter/X</b></a> | <a href="https://join.slack.com/t/dd-ranking/shared_invite/zt-2xlcuq1mf-hmVcfrtqrIB3qXRjwgB03A"><b>Developer Slack</b></a> |
1717
</p>
1818

1919

@@ -52,10 +52,10 @@ However, since the essence of soft labels is **knowledge distillation**, we find
5252

5353
This makes us wonder: **Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?**
5454

55-
Additionally, we have discoverd unfairness of using only test accuracy to demonstrate one's performance from the following three aspects:
56-
1. Results of using hard and soft labels are not directly comparable since soft labels introduce teacher knowledge.
57-
2. Strategies of using soft labels are diverse. For instance, different objective functions are used during evaluation, such as soft Cross-Entropy and Kullback–Leibler divergence. Also, one image may be mapped to one or multiple soft labels.
58-
3. Different data augmentations are used during evaluation.
55+
We summaize the evaluation configurations of existing works in the following table, with different colors highlighting different values for each configuration.
56+
![configurations](./static/configurations.png)
57+
As can be easily seen, the evaluation configurations are diverse, leading to unfairness of using only test accuracy to demonstrate one's performance.
58+
Among these inconsistencies, two critical factors significantly undermine the fairness of current evaluation protocols: label representation (including the corresponding loss function) and data augmentation techniques.
5959

6060
Motivated by this, we propose DD-Ranking, a new benchmark for DD evaluation. DD-Ranking provides a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.
6161

@@ -76,7 +76,8 @@ Revisit the original goal of dataset distillation:
7676
> The idea is to synthesize a small number of data points that do not need to come from the correct data distribution, but will, when given to the learning algorithm as training data, approximate the model trained on the original data. (Wang et al., 2020)
7777
>
7878
79-
The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the informativeness of the synthesized data by assessing the following two aspects:
79+
#### Label-Robust Score (LRS)
80+
For the label representation, we introduce the Label-Robust Score (LRS) to evaluate the informativeness of the synthesized data using the following two aspects:
8081
1. The degree to which the real dataset is recovered under hard labels (hard label recovery): $\text{HLR}=\text{Acc.}{\text{real-hard}}-\text{Acc.}{\text{syn-hard}}$.
8182

8283
2. The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$.
@@ -86,14 +87,21 @@ $\text{Acc.}$ is the accuracy of models trained on different samples. Samples' m
8687
- $\text{syn-any}$: Synthetic dataset with personalized evaluation methods (hard or soft labels);
8788
- $\text{rdm-any}$: Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods.
8889

89-
DD-Ranking uses a weight sum of $\text{IOR}$ and $-\text{HLR}$ to rank different methods:
90-
$\alpha = w\text{IOR}-(1-w)\text{HLR}, \quad w \in [0, 1]$
91-
92-
Formally, the **DD-Ranking Score (DDRS)** is defined as:
93-
$(e^{\alpha}-e^{-1}) / (e - e^{-1})$
90+
LRS is defined as a weight sum of $\text{IOR}$ and $-\text{HLR}$ to rank different methods:
91+
$\alpha = w\text{IOR}-(1-w)\text{HLR}, \quad w \in [0, 1]$.
92+
Then, the LRS is normalized to $[0, 1]$ as follows:
93+
$\text{LRS} = (e^{\alpha}-e^{-1}) / (e - e^{-1})$
9494

9595
By default, we set $w = 0.5$ on the leaderboard, meaning that both $\text{IOR}$ and $\text{HLR}$ are equally important. Users can adjust the weights to emphasize one aspect on the leaderboard.
9696

97+
#### Augmentation-Robust Score (ARS)
98+
To disentangle data augmentation’s impact, we introduce the augmentation-robust score (ARS) which continues to leverage the relative improvement over randomly selected data. Specifically, we first evaluate synthetic data and a randomly selected subset under the same setting to obtain $\text{Acc.}{\text{syn-aug}}$ and $\text{Acc.}{\text{rdm-aug}}$ (same as IOR). Next, we evaluate both synthetic data and random data again without the data augmentation, and results are denoted as $\text{Acc.}{\text{syn-naug}}$ and $\text{Acc.}{\text{rdm-naug}}$.
99+
Both differences, $\text{accsyn-aug} - \text{accrdm-aug}$ and $\text{accsyn-naug} - \text{accrdm-naug}$, are positively correlated to the real informativeness of the distilled dataset.
100+
101+
ARS is a weighted sum of the two differences:
102+
$\beta = \gamma(\text{accsyn-aug} - \text{accrdm-aug}) + (1 - \gamma)(\text{accsyn-naug} - \text{accrdm-naug})$,
103+
and normalized to $[0, 1]$ similarly.
104+
97105
</details>
98106

99107
## Overview
@@ -116,7 +124,12 @@ DD-Ranking currently includes the following datasets and methods (categorized by
116124
|CIFAR-10|DC|DATM|
117125
|CIFAR-100|DSA|SRe2L|
118126
|TinyImageNet|DM|RDED|
119-
||MTT|D4M|
127+
|ImageNet1K|MTT|D4M|
128+
| | DataDAM | EDF |
129+
| | | CDA |
130+
| | | DWA |
131+
| | | EDC |
132+
| | | G-VBSM |
120133

121134

122135

@@ -139,16 +152,17 @@ python setup.py install
139152
```
140153
### Quickstart
141154

142-
Below is a step-by-step guide on how to use our `dd_ranking`. This demo is based on soft labels (source code can be found in `demo_soft.py`). You can find hard label demo in `demo_hard.py`.
155+
Below is a step-by-step guide on how to use our `ddranking`. This demo is based on LRS on soft labels (source code can be found in `demo_lrs_soft.py`). You can find LRS on hard labels in `demo_lrs_hard.py` and ARS in `demo_aug.py`.
156+
DD-Ranking supports multi-GPU Distributed evaluation. You can simply use `torchrun` to launch the evaluation.
143157

144158
**Step1**: Intialize a soft-label metric evaluator object. Config files are recommended for users to specify hyper-parameters. Sample config files are provided [here](https://github.com/NUS-HPC-AI-Lab/DD-Ranking/tree/main/configs).
145159

146160
```python
147-
from ddranking.metrics import SoftLabelEvaluator
161+
from ddranking.metrics import LabelRobustScoreSoft
148162
from ddranking.config import Config
149163

150-
config = Config.from_file("./configs/Demo_Soft_Label.yaml")
151-
soft_label_metric_calc = SoftLabelEvaluator(config)
164+
config = Config.from_file("./configs/Demo_LRS_Soft_Label.yaml")
165+
lrs_soft_metric = LabelRobustScoreSoft(config)
152166
```
153167

154168
<details>
@@ -158,11 +172,12 @@ soft_label_metric_calc = SoftLabelEvaluator(config)
158172
device = "cuda"
159173
method_name = "DATM" # Specify your method name
160174
ipc = 10 # Specify your IPC
161-
dataset = "CIFAR10" # Specify your dataset name
162-
syn_data_dir = "./data/CIFAR10/IPC10/" # Specify your synthetic data path
175+
dataset = "CIFAR100" # Specify your dataset name
176+
syn_data_dir = "./data/CIFAR100/IPC10/" # Specify your synthetic data path
163177
real_data_dir = "./datasets" # Specify your dataset path
164178
model_name = "ConvNet-3" # Specify your model name
165179
teacher_dir = "./teacher_models" # Specify your path to teacher model chcekpoints
180+
teacher_model_names = ["ConvNet-3"] # Specify your teacher model names
166181
im_size = (32, 32) # Specify your image size
167182
dsa_params = { # Specify your data augmentation parameters
168183
"prob_flip": 0.5,
@@ -174,23 +189,31 @@ dsa_params = { # Specify your data augmentation paramet
174189
"ratio_crop_pad": 0.125,
175190
"ratio_cutout": 0.5
176191
}
192+
random_data_format = "tensor" # Specify your random data format (tensor or image)
193+
random_data_path = "./random_data" # Specify your random data path
177194
save_path = f"./results/{dataset}/{model_name}/IPC{ipc}/dm_hard_scores.csv"
178195

179196
""" We only list arguments that usually need specifying"""
180-
soft_label_metric_calc = SoftLabelEvaluator(
197+
lrs_soft_metric = LabelRobustScoreSoft(
181198
dataset=dataset,
182199
real_data_path=real_data_dir,
183200
ipc=ipc,
184201
model_name=model_name,
185202
soft_label_criterion='sce', # Use Soft Cross Entropy Loss
186203
soft_label_mode='S', # Use one-to-one image to soft label mapping
204+
loss_fn_kwargs={'temperature': 1.0, 'scale_loss': False},
187205
data_aug_func='dsa', # Use DSA data augmentation
188206
aug_params=dsa_params, # Specify dsa parameters
189207
im_size=im_size,
208+
random_data_format=random_data_format,
209+
random_data_path=random_data_path,
190210
stu_use_torchvision=False,
191211
tea_use_torchvision=False,
192-
teacher_dir='./teacher_models',
212+
teacher_dir=teacher_dir,
213+
teacher_model_names=teacher_model_names,
214+
num_eval=5,
193215
device=device,
216+
dist=True,
194217
save_path=save_path
195218
)
196219
```
@@ -210,24 +233,21 @@ syn_lr = torch.load('/your/path/to/syn/lr.pt')
210233
**Step 3:** Compute the metric.
211234

212235
```python
213-
metric = soft_label_metric_calc.compute_metrics(image_tensor=syn_images, soft_labels=soft_labels, syn_lr=syn_lr)
236+
lrs_soft_metric.compute_metrics(image_tensor=syn_images, soft_labels=soft_labels, syn_lr=syn_lr)
214237
# alternatively, you can specify the image folder path to compute the metric
215-
metric = soft_label_metric_calc.compute_metrics(image_path='./your/path/to/syn/images', soft_labels=soft_labels, syn_lr=syn_lr)
238+
lrs_soft_metric.compute_metrics(image_path='./your/path/to/syn/images', soft_labels=soft_labels, syn_lr=syn_lr)
216239
```
217240

218-
The following results will be returned to you:
241+
The following results will be printed and saved to `save_path`:
219242
- `HLR mean`: The mean of hard label recovery over `num_eval` runs.
220243
- `HLR std`: The standard deviation of hard label recovery over `num_eval` runs.
221244
- `IOR mean`: The mean of improvement over random over `num_eval` runs.
222245
- `IOR std`: The standard deviation of improvement over random over `num_eval` runs.
246+
- `LRS mean`: The mean of Label-Robust Score over `num_eval` runs.
247+
- `LRS std`: The standard deviation of Label-Robust Score over `num_eval` runs.
223248

224249
Check out our <span style="color: #ff0000;">[documentation](https://nus-hpc-ai-lab.github.io/DD-Ranking/)</span> to learn more.
225250

226-
## Coming Soon
227-
228-
- [ ] Evaluation results on ImageNet subsets.
229-
- [ ] More baseline methods.
230-
- [ ] DD-Ranking scores that decouple the impacts from data augmentation.
231251

232252
## Contributing
233253

@@ -236,7 +256,7 @@ Feel free to submit grades to update the DD-Ranking list. We welcome and value a
236256
Please check out [CONTRIBUTING.md](./CONTRIBUTING.md) for how to get involved.
237257

238258

239-
## Technical Members:
259+
<!-- ## Technical Members:
240260
- [Zekai Li*](https://lizekai-richard.github.io/) (National University of Singapore)
241261
- [Xinhao Zhong*](https://ndhg1213.github.io/) (National University of Singapore)
242262
- [Zhiyuan Liang](https://jerryliang24.github.io/) (University of Science and Technology of China)
@@ -282,35 +302,25 @@ Please check out [CONTRIBUTING.md](./CONTRIBUTING.md) for how to get involved.
282302
- [Yang You](https://www.comp.nus.edu.sg/~youy/) (National University of Singapore)
283303
- [Kai Wang](https://kaiwang960112.github.io/) (National University of Singapore)
284304
285-
\* *equal contribution*
305+
\* *equal contribution* -->
286306

287307
## License
288308

289309
DD-Ranking is released under the MIT License. See [LICENSE](./LICENSE) for more details.
290310

291-
## Related Works
292-
293-
- [Dataset Distillation](https://arxiv.org/abs/1811.10959), Wang et al., in arXiv 2018.
294-
- [Dataset Condensation with Gradient Matching](https://arxiv.org/abs/2006.05929), Zhao et al., in ICLR 2020.
295-
- [Dataset Condensation with Differentiable Siamese Augmentation](https://arxiv.org/abs/2102.08259), Zhao \& Bilen, in ICML 2021.
296-
- [Dataset Distillation via Matching Training Trajectories](https://arxiv.org/abs/2203.11932), Cazenavette et al., in CVPR 2022.
297-
- [Dataset Distillation with Distribution Matching](https://arxiv.org/abs/2110.04181), Zhao \& Bilen, in WACV 2023.
298-
- [Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective](https://arxiv.org/abs/2306.13092), Yin et al., in NeurIPS 2023.
299-
- [Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching](https://arxiv.org/abs/2310.05773), Guo et al., in ICLR 2024.
300-
- [On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm](https://arxiv.org/abs/2312.03526), Sun et al., in CVPR 2024.
301-
- [D4M: Dataset Distillation via Disentangled Diffusion Model](https://arxiv.org/abs/2407.15138), Su et al., in CVPR 2024.
302-
303311

304312
## Reference
305313

306314
If you find DD-Ranking useful in your research, please consider citing the following paper:
307315

308316
```bibtex
309-
@misc{li2024ddranking,
310-
title = {DD-Ranking: Rethinking the Evaluation of Dataset Distillation},
311-
author = {Li, Zekai and Zhong, Xinhao and Liang, Zhiyuan and Zhou, Yuhao and Shi, Mingjia and Wang, Ziqiao and Zhao, Wangbo and Zhao, Xuanlei and Wang, Haonan and Qin, Ziheng and Liu, Dai and Zhang, Kaipeng and Zhou, Tianyi and Zhu, Zheng and Wang, Kun and Li, Guang and Zhang, Junhao and Liu, Jiawei and Huang, Yiran and Lyu, Lingjuan and Lv, Jiancheng and Jin, Yaochu and Akata, Zeynep and Gu, Jindong and Vedantam, Rama and Shou, Mike and Deng, Zhiwei and Yan, Yan and Shang, Yuzhang and Cazenavette, George and Wu, Xindi and Cui, Justin and Chen, Tianlong and Yao, Angela and Kellis, Manolis and Plataniotis, Konstantinos N. and Zhao, Bo and Wang, Zhangyang and You, Yang and Wang, Kai},
312-
year = {2024},
313-
howpublished = {GitHub repository},
314-
url = {https://github.com/NUS-HPC-AI-Lab/DD-Ranking}
317+
@misc{li2025ddrankingrethinkingevaluationdataset,
318+
title={DD-Ranking: Rethinking the Evaluation of Dataset Distillation},
319+
author={Zekai Li and Xinhao Zhong and Samir Khaki and Zhiyuan Liang and Yuhao Zhou and Mingjia Shi and Ziqiao Wang and Xuanlei Zhao and Wangbo Zhao and Ziheng Qin and Mengxuan Wu and Pengfei Zhou and Haonan Wang and David Junhao Zhang and Jia-Wei Liu and Shaobo Wang and Dai Liu and Linfeng Zhang and Guang Li and Kun Wang and Zheng Zhu and Zhiheng Ma and Joey Tianyi Zhou and Jiancheng Lv and Yaochu Jin and Peihao Wang and Kaipeng Zhang and Lingjuan Lyu and Yiran Huang and Zeynep Akata and Zhiwei Deng and Xindi Wu and George Cazenavette and Yuzhang Shang and Justin Cui and Jindong Gu and Qian Zheng and Hao Ye and Shuo Wang and Xiaobo Wang and Yan Yan and Angela Yao and Mike Zheng Shou and Tianlong Chen and Hakan Bilen and Baharan Mirzasoleiman and Manolis Kellis and Konstantinos N. Plataniotis and Zhangyang Wang and Bo Zhao and Yang You and Kai Wang},
320+
year={2025},
321+
eprint={2505.13300},
322+
archivePrefix={arXiv},
323+
primaryClass={cs.CV},
324+
url={https://arxiv.org/abs/2505.13300},
315325
}
316326
```

book/404.html

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,22 @@ <h1 id="document-not-found-404"><a class="header" href="#document-not-found-404"
175175

176176
</div>
177177

178+
<!-- Livereload script (if served using the cli tool) -->
179+
<script>
180+
const wsProtocol = location.protocol === 'https:' ? 'wss:' : 'ws:';
181+
const wsAddress = wsProtocol + "//" + location.host + "/" + "__livereload";
182+
const socket = new WebSocket(wsAddress);
183+
socket.onmessage = function (event) {
184+
if (event.data === "reload") {
185+
socket.close();
186+
location.reload();
187+
}
188+
};
189+
190+
window.onbeforeunload = function() {
191+
socket.close();
192+
}
193+
</script>
178194

179195

180196

book/augmentations/cutmix.html

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,22 @@ <h3 id="example"><a class="header" href="#example">Example</a></h3>
202202

203203
</div>
204204

205+
<!-- Livereload script (if served using the cli tool) -->
206+
<script>
207+
const wsProtocol = location.protocol === 'https:' ? 'wss:' : 'ws:';
208+
const wsAddress = wsProtocol + "//" + location.host + "/" + "__livereload";
209+
const socket = new WebSocket(wsAddress);
210+
socket.onmessage = function (event) {
211+
if (event.data === "reload") {
212+
socket.close();
213+
location.reload();
214+
}
215+
};
216+
217+
window.onbeforeunload = function() {
218+
socket.close();
219+
}
220+
</script>
205221

206222

207223

book/augmentations/dsa.html

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,22 @@ <h3 id="example"><a class="header" href="#example">Example</a></h3>
215215

216216
</div>
217217

218+
<!-- Livereload script (if served using the cli tool) -->
219+
<script>
220+
const wsProtocol = location.protocol === 'https:' ? 'wss:' : 'ws:';
221+
const wsAddress = wsProtocol + "//" + location.host + "/" + "__livereload";
222+
const socket = new WebSocket(wsAddress);
223+
socket.onmessage = function (event) {
224+
if (event.data === "reload") {
225+
socket.close();
226+
location.reload();
227+
}
228+
};
229+
230+
window.onbeforeunload = function() {
231+
socket.close();
232+
}
233+
</script>
218234

219235

220236

book/augmentations/mixup.html

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,22 @@ <h3 id="example"><a class="header" href="#example">Example</a></h3>
202202

203203
</div>
204204

205+
<!-- Livereload script (if served using the cli tool) -->
206+
<script>
207+
const wsProtocol = location.protocol === 'https:' ? 'wss:' : 'ws:';
208+
const wsAddress = wsProtocol + "//" + location.host + "/" + "__livereload";
209+
const socket = new WebSocket(wsAddress);
210+
socket.onmessage = function (event) {
211+
if (event.data === "reload") {
212+
socket.close();
213+
location.reload();
214+
}
215+
};
216+
217+
window.onbeforeunload = function() {
218+
socket.close();
219+
}
220+
</script>
205221

206222

207223

book/augmentations/overview.html

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,22 @@ <h1 id="augmentations"><a class="header" href="#augmentations">Augmentations</a>
212212

213213
</div>
214214

215+
<!-- Livereload script (if served using the cli tool) -->
216+
<script>
217+
const wsProtocol = location.protocol === 'https:' ? 'wss:' : 'ws:';
218+
const wsAddress = wsProtocol + "//" + location.host + "/" + "__livereload";
219+
const socket = new WebSocket(wsAddress);
220+
socket.onmessage = function (event) {
221+
if (event.data === "reload") {
222+
socket.close();
223+
location.reload();
224+
}
225+
};
226+
227+
window.onbeforeunload = function() {
228+
socket.close();
229+
}
230+
</script>
215231

216232

217233

0 commit comments

Comments
 (0)