You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
155
+
<p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
<p>Dataset Distillation (DD) aims to condense a large dataset into a much smaller one, which allows a model to achieve comparable performance after training on it. DD has gained extensive attention since it was proposed. With some foundational methods such as DC, DM, and MTT, various works have further pushed this area to a new standard with their novel designs.</p>
<p>Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. <strong>Hard labels</strong> are categorical, having the same format of the real dataset. <strong>Soft labels</strong> are distributions, typically generated by a pre-trained teacher model.
161
160
Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement.</p>
162
161
<p>However, since the essence of soft labels is <strong>knowledge distillation</strong>, we want to ask a question: <strong>Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?</strong></p>
<p>The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects:</p>
183
182
<ol>
184
183
<li>
185
-
<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR}=\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}}$.</p>
184
+
<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR} = \text{Acc}<em>{\text{full-hard}} - \text{Acc}</em>{\text{syn-hard}}$</p>
186
185
</li>
187
186
<li>
188
-
<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$.
189
-
$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
187
+
<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR} = \text{Acc}<em>{\text{syn-any}} - \text{Acc}</em>{\text{rdm-any}}$</p>
190
188
</li>
191
189
</ol>
190
+
<p>$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
192
191
<ul>
193
192
<li>$\text{full-hard}$: Full dataset with hard labels;</li>
194
193
<li>$\text{syn-hard}$: Synthetic dataset with hard labels;</li>
195
194
<li>$\text{syn-any}$: Synthetic dataset with personalized evaluation methods (hard or soft labels);</li>
196
195
<li>$\text{rdm-any}$: Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods.</li>
197
196
</ul>
198
197
<p>To rank different methods, we combine the above two metrics as DD-Ranking Score:</p>
<p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
155
+
<p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
<p>Dataset Distillation (DD) aims to condense a large dataset into a much smaller one, which allows a model to achieve comparable performance after training on it. DD has gained extensive attention since it was proposed. With some foundational methods such as DC, DM, and MTT, various works have further pushed this area to a new standard with their novel designs.</p>
<p>Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. <strong>Hard labels</strong> are categorical, having the same format of the real dataset. <strong>Soft labels</strong> are distributions, typically generated by a pre-trained teacher model.
161
160
Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement.</p>
162
161
<p>However, since the essence of soft labels is <strong>knowledge distillation</strong>, we want to ask a question: <strong>Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?</strong></p>
<p>The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects:</p>
183
182
<ol>
184
183
<li>
185
-
<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR}=\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}}$.</p>
184
+
<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR} = \text{Acc}<em>{\text{full-hard}} - \text{Acc}</em>{\text{syn-hard}}$</p>
186
185
</li>
187
186
<li>
188
-
<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$.
189
-
$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
187
+
<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR} = \text{Acc}<em>{\text{syn-any}} - \text{Acc}</em>{\text{rdm-any}}$</p>
190
188
</li>
191
189
</ol>
190
+
<p>$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
192
191
<ul>
193
192
<li>$\text{full-hard}$: Full dataset with hard labels;</li>
194
193
<li>$\text{syn-hard}$: Synthetic dataset with hard labels;</li>
195
194
<li>$\text{syn-any}$: Synthetic dataset with personalized evaluation methods (hard or soft labels);</li>
196
195
<li>$\text{rdm-any}$: Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods.</li>
197
196
</ul>
198
197
<p>To rank different methods, we combine the above two metrics as DD-Ranking Score:</p>
<p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
156
+
<p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
<p>Dataset Distillation (DD) aims to condense a large dataset into a much smaller one, which allows a model to achieve comparable performance after training on it. DD has gained extensive attention since it was proposed. With some foundational methods such as DC, DM, and MTT, various works have further pushed this area to a new standard with their novel designs.</p>
<p>Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. <strong>Hard labels</strong> are categorical, having the same format of the real dataset. <strong>Soft labels</strong> are distributions, typically generated by a pre-trained teacher model.
162
161
Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement.</p>
163
162
<p>However, since the essence of soft labels is <strong>knowledge distillation</strong>, we want to ask a question: <strong>Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?</strong></p>
<p>The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects:</p>
184
183
<ol>
185
184
<li>
186
-
<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR}=\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}}$.</p>
185
+
<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR} = \text{Acc}<em>{\text{full-hard}} - \text{Acc}</em>{\text{syn-hard}}$</p>
187
186
</li>
188
187
<li>
189
-
<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$.
190
-
$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
188
+
<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR} = \text{Acc}<em>{\text{syn-any}} - \text{Acc}</em>{\text{rdm-any}}$</p>
191
189
</li>
192
190
</ol>
191
+
<p>$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
193
192
<ul>
194
193
<li>$\text{full-hard}$: Full dataset with hard labels;</li>
195
194
<li>$\text{syn-hard}$: Synthetic dataset with hard labels;</li>
196
195
<li>$\text{syn-any}$: Synthetic dataset with personalized evaluation methods (hard or soft labels);</li>
197
196
<li>$\text{rdm-any}$: Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods.</li>
198
197
</ul>
199
198
<p>To rank different methods, we combine the above two metrics as DD-Ranking Score:</p>
Copy file name to clipboardExpand all lines: doc/introduction.md
+4-9Lines changed: 4 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,8 @@
1
-
2
1
DD-Ranking (DD, *i.e.*, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.
3
-
4
2
## Motivation
5
-
6
3
Dataset Distillation (DD) aims to condense a large dataset into a much smaller one, which allows a model to achieve comparable performance after training on it. DD has gained extensive attention since it was proposed. With some foundational methods such as DC, DM, and MTT, various works have further pushed this area to a new standard with their novel designs.
7
4
8
-

5
+

9
6
10
7
Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. **Hard labels** are categorical, having the same format of the real dataset. **Soft labels** are distributions, typically generated by a pre-trained teacher model.
11
8
Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement.
@@ -33,11 +30,9 @@ Revisit the original goal of dataset distillation:
33
30
>
34
31
35
32
The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects:
36
-
1. The degree to which the original dataset is recovered under hard labels (hard label recovery):
1. The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR} = \text{Acc.} \text{full-hard} - \text{Acc.} \text{syn-hard}$
38
34
39
-
2. The improvement over random selection when using personalized evaluation methods (improvement over random):
2. The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR} = \text{Acc.} \text{syn-any} - \text{Acc.} \text{rdm-any}$
41
36
42
37
$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:
43
38
@@ -48,5 +43,5 @@ $\text{Acc.}$ is the accuracy of models trained on different samples. Samples' m
48
43
49
44
To rank different methods, we combine the above two metrics as DD-Ranking Score:
0 commit comments