|
13 | 13 | Fair and benchmark for dataset distillation. |
14 | 14 | </h3> --> |
15 | 15 | <p align="center"> |
16 | | -| <a href=""><b>Documentation</b></a> | <a href=""><b>Leaderboard</b></a> | <b>Paper </b> (Coming Soon) | <a href=""><b>Twitter/X</b></a> | <a href=""><b>Developer Slack</b></a> | |
| 16 | +| <a href="https://nus-hpc-ai-lab.github.io/DD-Ranking/"><b>Documentation</b></a> | <a href="https://nus-hpc-ai-lab.github.io/DD-Ranking/"><b>Leaderboard</b></a> | <b>Paper </b> (Coming Soon) | <a href=""><b>Twitter/X</b></a> | <a href=""><b>Developer Slack</b></a> | |
17 | 17 | </p> |
18 | 18 |
|
19 | 19 |
|
@@ -43,9 +43,11 @@ Dataset Distillation (DD) aims to condense a large dataset into a much smaller o |
43 | 43 | Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. **Hard labels** are categorical, having the same format of the real dataset. **Soft labels** are distributions, typically generated by a pre-trained teacher model. |
44 | 44 | Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement. |
45 | 45 |
|
46 | | -However, since the essence of soft labels is **knowledge distillation**, we want to ask a question: **Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?** |
| 46 | +However, since the essence of soft labels is **knowledge distillation**, we find that when applying the same evaluation method to randomly selected data, the test accuracy also improves significantly (see the figure above). |
47 | 47 |
|
48 | | -Specifically, we have discoverd unfairness of using only test accuracy to demonstrate one's performance from the following three aspects: |
| 48 | +This makes us wonder: **Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?** |
| 49 | + |
| 50 | +Additionally, we have discoverd unfairness of using only test accuracy to demonstrate one's performance from the following three aspects: |
49 | 51 | 1. Results of using hard and soft labels are not directly comparable since soft labels introduce teacher knowledge. |
50 | 52 | 2. Strategies of using soft labels are diverse. For instance, different objective functions are used during evaluation, such as soft Cross-Entropy and Kullback–Leibler divergence. Also, one image may be mapped to one or multiple soft labels. |
51 | 53 | 3. Different data augmentations are used during evaluation. |
|
0 commit comments