NUS-HPC-AI-Lab
diff --git a/‎README.md‎
Lines changed: 2 additions & 2 deletions b/‎README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/introduction.md‎
Lines changed: 5 additions & 3 deletions b/‎doc/introduction.md‎
Lines changed: 5 additions & 3 deletions
diff --git a/‎doc/static/history.png‎
22.6 KB b/‎doc/static/history.png‎
22.6 KB
diff --git a/‎static/history.png‎
22.6 KB b/‎static/history.png‎
22.6 KB
diff --git a/‎static/logo.png‎
35.6 KB b/‎static/logo.png‎
35.6 KB
@@ -3,9 +3,9 @@
 <p align="center">
   <picture>
   <!-- Dark theme logo -->
-    <source media="(prefers-color-scheme: dark)" srcset="XX.png">
+    <source media="(prefers-color-scheme: dark)" srcset="static/logo.png">
     <!-- Light theme logo -->
-    <img alt="vLLM" src="XX.png" width=55%>
+    <img alt="DD-Ranking" src="static/logo.png" width=55%>
   </picture>
 </p>
 
 
@@ -11,14 +11,16 @@ Dataset Distillation (DD) aims to condense a large dataset into a much smaller o
 Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. **Hard labels** are categorical, having the same format of the real dataset. **Soft labels** are distributions, typically generated by a pre-trained teacher model. 
 Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement. 
 
-However, since the essence of soft labels is **knowledge distillation**, we want to ask a question: **Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?**
+However, since the essence of soft labels is **knowledge distillation**, we find that when applying the same evaluation method to randomly selected data, the test accuracy also improves significantly (see the figure above).
 
-Specifically, we have discoverd unfairness of using only test accuracy to demonstrate one's performance from the following three aspects:
+This makes us wonder: **Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?**
+
+Additionally, we have discoverd unfairness of using only test accuracy to demonstrate one's performance from the following three aspects:
 1. Results of using hard and soft labels are not directly comparable since soft labels introduce teacher knowledge.
 2. Strategies of using soft labels are diverse. For instance, different objective functions are used during evaluation, such as soft Cross-Entropy and Kullback–Leibler divergence. Also, one image may be mapped to one or multiple soft labels.
 3. Different data augmentations are used during evaluation.
 
-Motivated by this, we propose DD-Ranking, a new benchmark for DD evaluation. DD-Ranking provides a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.
+Motivated by this, we propose DD-Ranking, a new benchmark for DD evaluation. DD-Ranking provides a fair evaluation scheme for DD methods, and can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.
 
 ## Features