NUS-HPC-AI-Lab
diff --git a/‎book/index.html‎
Lines changed: 6 additions & 7 deletions b/‎book/index.html‎
Lines changed: 6 additions & 7 deletions
diff --git a/‎book/introduction.html‎
Lines changed: 6 additions & 7 deletions b/‎book/introduction.html‎
Lines changed: 6 additions & 7 deletions
diff --git a/‎book/print.html‎
Lines changed: 6 additions & 7 deletions b/‎book/print.html‎
Lines changed: 6 additions & 7 deletions
diff --git a/‎book/searchindex.js‎
Lines changed: 1 addition & 1 deletion b/‎book/searchindex.js‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎book/searchindex.json‎
Lines changed: 1 addition & 1 deletion b/‎book/searchindex.json‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎book/static/history.png‎
73.6 KB b/‎book/static/history.png‎
73.6 KB
diff --git a/‎doc/introduction.md‎
Lines changed: 4 additions & 9 deletions b/‎doc/introduction.md‎
Lines changed: 4 additions & 9 deletions
diff --git a/‎doc/static/history.png‎
73.6 KB b/‎doc/static/history.png‎
73.6 KB
@@ -152,11 +152,10 @@ <h1 class="menu-title">DD-Ranking Benchmark</h1>
 
                 <div id="content" class="content">
                     <main>
-                        <h1 align="center">DD-Ranking</h1>
-<p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
+                        <p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
 <h2 id="motivation"><a class="header" href="#motivation">Motivation</a></h2>
 <p>Dataset Distillation (DD) aims to condense a large dataset into a much smaller one, which allows a model to achieve comparable performance after training on it. DD has gained extensive attention since it was proposed. With some foundational methods such as DC, DM, and MTT, various works have further pushed this area to a new standard with their novel designs.</p>
-<p><img src="../static/history.png" alt="history" /></p>
+<p><img src="static/history.png" alt="history" /></p>
 <p>Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. <strong>Hard labels</strong> are categorical, having the same format of the real dataset. <strong>Soft labels</strong> are distributions, typically generated by a pre-trained teacher model.
 Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement.</p>
 <p>However, since the essence of soft labels is <strong>knowledge distillation</strong>, we want to ask a question: <strong>Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?</strong></p>
@@ -182,21 +181,21 @@ <h2 id="dd-ranking-score"><a class="header" href="#dd-ranking-score">DD-Ranking
 <p>The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects:</p>
 <ol>
 <li>
-<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR}=\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}}$.</p>
+<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR} = \text{Acc}<em>{\text{full-hard}} - \text{Acc}</em>{\text{syn-hard}}$</p>
 </li>
 <li>
-<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$.
-$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
+<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR} = \text{Acc}<em>{\text{syn-any}} - \text{Acc}</em>{\text{rdm-any}}$</p>
 </li>
 </ol>
+<p>$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
 <ul>
 <li>$\text{full-hard}$: Full dataset with hard labels;</li>
 <li>$\text{syn-hard}$: Synthetic dataset with hard labels;</li>
 <li>$\text{syn-any}$: Synthetic dataset with personalized evaluation methods (hard or soft labels);</li>
 <li>$\text{rdm-any}$: Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods.</li>
 </ul>
 <p>To rank different methods, we combine the above two metrics as DD-Ranking Score:</p>
-<p>$$\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}})}{(\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}})}$$</p>
+<p>$$\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc}<em>{\text{syn-any}}-\text{Acc}</em>{\text{rdm-any}})}{(\text{Acc}<em>{\text{full-hard}}-\text{Acc}</em>{\text{syn-hard}})}$$</p>
 
                     </main>
 
 
@@ -152,11 +152,10 @@ <h1 class="menu-title">DD-Ranking Benchmark</h1>
 
                 <div id="content" class="content">
                     <main>
-                        <h1 align="center">DD-Ranking</h1>
-<p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
+                        <p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
 <h2 id="motivation"><a class="header" href="#motivation">Motivation</a></h2>
 <p>Dataset Distillation (DD) aims to condense a large dataset into a much smaller one, which allows a model to achieve comparable performance after training on it. DD has gained extensive attention since it was proposed. With some foundational methods such as DC, DM, and MTT, various works have further pushed this area to a new standard with their novel designs.</p>
-<p><img src="../static/history.png" alt="history" /></p>
+<p><img src="static/history.png" alt="history" /></p>
 <p>Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. <strong>Hard labels</strong> are categorical, having the same format of the real dataset. <strong>Soft labels</strong> are distributions, typically generated by a pre-trained teacher model.
 Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement.</p>
 <p>However, since the essence of soft labels is <strong>knowledge distillation</strong>, we want to ask a question: <strong>Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?</strong></p>
@@ -182,21 +181,21 @@ <h2 id="dd-ranking-score"><a class="header" href="#dd-ranking-score">DD-Ranking
 <p>The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects:</p>
 <ol>
 <li>
-<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR}=\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}}$.</p>
+<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR} = \text{Acc}<em>{\text{full-hard}} - \text{Acc}</em>{\text{syn-hard}}$</p>
 </li>
 <li>
-<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$.
-$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
+<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR} = \text{Acc}<em>{\text{syn-any}} - \text{Acc}</em>{\text{rdm-any}}$</p>
 </li>
 </ol>
+<p>$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
 <ul>
 <li>$\text{full-hard}$: Full dataset with hard labels;</li>
 <li>$\text{syn-hard}$: Synthetic dataset with hard labels;</li>
 <li>$\text{syn-any}$: Synthetic dataset with personalized evaluation methods (hard or soft labels);</li>
 <li>$\text{rdm-any}$: Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods.</li>
 </ul>
 <p>To rank different methods, we combine the above two metrics as DD-Ranking Score:</p>
-<p>$$\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}})}{(\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}})}$$</p>
+<p>$$\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc}<em>{\text{syn-any}}-\text{Acc}</em>{\text{rdm-any}})}{(\text{Acc}<em>{\text{full-hard}}-\text{Acc}</em>{\text{syn-hard}})}$$</p>
 
                     </main>
 
 
@@ -153,11 +153,10 @@ <h1 class="menu-title">DD-Ranking Benchmark</h1>
 
                 <div id="content" class="content">
                     <main>
-                        <h1 align="center">DD-Ranking</h1>
-<p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
+                        <p>DD-Ranking (DD, <em>i.e.</em>, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.</p>
 <h2 id="motivation"><a class="header" href="#motivation">Motivation</a></h2>
 <p>Dataset Distillation (DD) aims to condense a large dataset into a much smaller one, which allows a model to achieve comparable performance after training on it. DD has gained extensive attention since it was proposed. With some foundational methods such as DC, DM, and MTT, various works have further pushed this area to a new standard with their novel designs.</p>
-<p><img src="../static/history.png" alt="history" /></p>
+<p><img src="static/history.png" alt="history" /></p>
 <p>Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. <strong>Hard labels</strong> are categorical, having the same format of the real dataset. <strong>Soft labels</strong> are distributions, typically generated by a pre-trained teacher model.
 Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement.</p>
 <p>However, since the essence of soft labels is <strong>knowledge distillation</strong>, we want to ask a question: <strong>Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?</strong></p>
@@ -183,21 +182,21 @@ <h2 id="dd-ranking-score"><a class="header" href="#dd-ranking-score">DD-Ranking
 <p>The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects:</p>
 <ol>
 <li>
-<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR}=\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}}$.</p>
+<p>The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR} = \text{Acc}<em>{\text{full-hard}} - \text{Acc}</em>{\text{syn-hard}}$</p>
 </li>
 <li>
-<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$.
-$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
+<p>The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR} = \text{Acc}<em>{\text{syn-any}} - \text{Acc}</em>{\text{rdm-any}}$</p>
 </li>
 </ol>
+<p>$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:</p>
 <ul>
 <li>$\text{full-hard}$: Full dataset with hard labels;</li>
 <li>$\text{syn-hard}$: Synthetic dataset with hard labels;</li>
 <li>$\text{syn-any}$: Synthetic dataset with personalized evaluation methods (hard or soft labels);</li>
 <li>$\text{rdm-any}$: Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods.</li>
 </ul>
 <p>To rank different methods, we combine the above two metrics as DD-Ranking Score:</p>
-<p>$$\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}})}{(\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}})}$$</p>
+<p>$$\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc}<em>{\text{syn-any}}-\text{Acc}</em>{\text{rdm-any}})}{(\text{Acc}<em>{\text{full-hard}}-\text{Acc}</em>{\text{syn-hard}})}$$</p>
 <div style="break-before: page; page-break-before: always;"></div><h2 id="installation"><a class="header" href="#installation">Installation</a></h2>
 <p>From pip</p>
 <pre><code class="language-bash">pip install dd_ranking
 
@@ -1,11 +1,8 @@
-
 DD-Ranking (DD, *i.e.*, Dataset Distillation) is an integrated and easy-to-use evaluation benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data.
-
 ## Motivation
-
 Dataset Distillation (DD) aims to condense a large dataset into a much smaller one, which allows a model to achieve comparable performance after training on it. DD has gained extensive attention since it was proposed. With some foundational methods such as DC, DM, and MTT, various works have further pushed this area to a new standard with their novel designs.
 
-![history](../static/history.png)
+![history](static/history.png)
 
 Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. **Hard labels** are categorical, having the same format of the real dataset. **Soft labels** are distributions, typically generated by a pre-trained teacher model. 
 Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement. 
@@ -33,11 +30,9 @@ Revisit the original goal of dataset distillation:
 >
 
 The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects:  
-1. The degree to which the original dataset is recovered under hard labels (hard label recovery): 
-$\text{HLR}=\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}}$.  
+1. The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR} = \text{Acc.} \text{full-hard} - \text{Acc.} \text{syn-hard}$
 
-2. The improvement over random selection when using personalized evaluation methods (improvement over random): 
-$\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$.
+2. The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR} = \text{Acc.} \text{syn-any} - \text{Acc.} \text{rdm-any}$
 
 $\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows:
 
@@ -48,5 +43,5 @@ $\text{Acc.}$ is the accuracy of models trained on different samples. Samples' m
 
 To rank different methods, we combine the above two metrics as DD-Ranking Score:
 
-$$\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}})}{(\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}})}$$
+$$\text{DD-Ranking Score} = \frac{\text{IOR}}{\text{HLR}} = \frac{(\text{Acc.} \text{syn-any}-\text{Acc.} \text{rdm-any})}{(\text{Acc.} \text{full-hard}-\text{Acc.} \text{syn-hard})}$$