|
1 | 1 | # <center>DD-Ranking</center> |
2 | 2 |
|
3 | | -## Motivation |
| 3 | +<p align="center"> |
| 4 | + <picture> |
| 5 | + <!-- Dark theme logo --> |
| 6 | + <source media="(prefers-color-scheme: dark)" srcset="XX.png"> |
| 7 | + <!-- Light theme logo --> |
| 8 | + <img alt="vLLM" src="XX.png" width=55%> |
| 9 | + </picture> |
| 10 | +</p> |
| 11 | + |
| 12 | +<h3 align="center"> |
| 13 | +Integrated and easy-to-use benchmark for data-distillation. |
| 14 | +</h3> |
| 15 | +<p align="center"> |
| 16 | +| <a href=""><b>Documentation</b></a> | <a href=""><b>Leaderboard</b></a> | <b>Paper </b> (Coming Soon) | <a href=""><b>Twitter/X</b></a> | <a href=""><b>Developer Slack</b></a> | |
| 17 | +</p> |
| 18 | + |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +*Latest News* 🔥 |
| 23 | + |
| 24 | +[Latest] We officially released DD-Ranking! DD-Ranking provides us a new benchmark decoupling the impacts from knowledge distillation and data augmentation. |
| 25 | + |
| 26 | +<details> |
| 27 | +<summary>Unfold to see more details.</summary> |
| 28 | +<br> |
| 29 | +- [2024/12] We officially released DD-Ranking! DD-Ranking provides us a new benchmark decoupling the impacts from knowledge distillation and data augmentation. |
| 30 | +</details> |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## Motivation: DD Lacks an Evaluation Benchmark |
| 35 | + |
| 36 | +<details> |
| 37 | +<summary>Unfold to see more details.</summary> |
4 | 38 |
|
5 | 39 | Dataset Distillation (DD) aims to condense a large dataset into a much smaller one, which allows a model to achieve comparable performance after training on it. DD has gained extensive attention since it was proposed. With some foundational methods such as DC, DM, and MTT, various works have further pushed this area to a new standard with their novel designs. |
6 | 40 |
|
7 | | -Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation.**Hard labels** are categorical, having the same format of the real dataset. **Soft labels** are distributions, typically generated by a pre-trained teacher model. |
| 41 | + |
| 42 | + |
| 43 | +Notebaly, more and more methods are transitting from "hard label" to "soft label" in dataset distillation, especially during evaluation. **Hard labels** are categorical, having the same format of the real dataset. **Soft labels** are distributions, typically generated by a pre-trained teacher model. |
| 44 | +Recently, Deng et al., pointed out that "a label is worth a thousand images". They showed analytically that soft labels are exetremely useful for accuracy improvement. |
| 45 | + |
| 46 | +However, since the essence of soft labels is **knowledge distillation**, we want to ask a question: **Can the test accuracy of the model trained on distilled data reflect the real informativeness of the distilled data?** |
| 47 | + |
| 48 | +Specifically, we have discoverd unfairness of using only test accuracy to demonstrate one's performance from the following three aspects: |
| 49 | +1. Results of using hard and soft labels are not directly comparable since soft labels introduce teacher knowledge. |
| 50 | +2. Strategies of using soft labels are diverse. For instance, different objective functions are used during evaluation, such as soft Cross-Entropy and Kullback–Leibler divergence. Also, one image may be mapped to one or multiple soft labels. |
| 51 | +3. Different data augmentations are used during evaluation. |
| 52 | + |
| 53 | +Motivated by this, we propose DD-Ranking, a new benchmark for DD evaluation. DD-Ranking provides a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data. |
| 54 | + |
| 55 | +</details> |
| 56 | + |
| 57 | +## About |
| 58 | + |
| 59 | +DD-Ranking (DD, *i.e.*, Dataset Distillation) is an integrated and easy-to-use benchmark for dataset distillation. It aims to provide a fair evaluation scheme for DD methods that can decouple the impacts from knowledge distillation and data augmentation to reflect the real informativeness of the distilled data. |
| 60 | + |
| 61 | +<!-- Hard label is tested --> |
| 62 | +<!-- Keep the same compression ratio, comparing with random selection --> |
| 63 | +**Performance benchmark** |
| 64 | + |
| 65 | +<span style="color: #ffff00;">[To Verify]:</span>Revisit the original goal of dataset distillation: |
| 66 | +> The idea is to synthesize a small number of data points that do not need to come from the correct data distribution, but will, when given to the learning algorithm as training data, approximate the model trained on the original data. |
| 67 | +> |
| 68 | +
|
| 69 | +The evaluation method for DD-Ranking is grounded in the essence of dataset distillation, aiming to better reflect the information content of the synthesized data by assessing the following two aspects: |
| 70 | +1. The degree to which the original dataset is recovered under hard labels (hard label recovery): $\text{HLR}=\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}}$. |
| 71 | + |
| 72 | +2. The improvement over random selection when using personalized evaluation methods (improvement over random): $\text{IOR}=\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}}$. |
| 73 | +$\text{Acc.}$ is the accuracy of models trained on different samples. Samples' marks are as follows: |
| 74 | +- $\text{full-hard}$: Full dataset with hard labels; |
| 75 | +- $\text{syn-hard}$: Synthetic dataset with hard labels; |
| 76 | +- $\text{syn-any}$: Synthetic dataset with personalized evaluation methods (hard or soft labels); |
| 77 | +- $\text{rdm-any}$: Randomly selected dataset (under the same compression ratio) with the same personalized evaluation methods. |
| 78 | + |
| 79 | +To rank different methods, we combine the above two metrics as follows: |
| 80 | + |
| 81 | +$$\text{IOR}/\text{HLR} = \frac{(\text{Acc.}{\text{syn-any}}-\text{Acc.}{\text{rdm-any}})}{(\text{Acc.}{\text{full-hard}}-\text{Acc.}{\text{syn-hard}})}$$ |
| 82 | + |
| 83 | +DD-Ranking is integrated with: |
| 84 | +<!-- Uniform Fair Labels: loss on soft label --> |
| 85 | +<!-- Data Aug. --> |
| 86 | +- <span style="color: #ffff00;">[To Verify]:</span>Multiple [strategies](https://github.com/NUS-HPC-AI-Lab/DD-Ranking/tree/main/dd_ranking/loss) of using soft labels; |
| 87 | +- <span style="color: #ffff00;">[To Verify]:</span>Data augmentation, reconsidered as [optional tricks](https://github.com/NUS-HPC-AI-Lab/DD-Ranking/tree/main/dd_ranking/aug) in DD; |
| 88 | +- <span style="color: #ffff00;">[To Verify]:</span>Commonly used [model architectures](https://github.com/NUS-HPC-AI-Lab/DD-Ranking/blob/main/dd_ranking/utils/networks.py) in DD. |
| 89 | +<span style="color: #ffff00;">[To Verify]:</span> A new ranking on representative DD methods. |
| 90 | + |
| 91 | +DD-Ranking is flexible and easy to use, supported by: |
| 92 | +<!-- Defualt configs: Customized configs --> |
| 93 | +<!-- Integrated classes: 1) Optimizer and etc.; 2) random selection tests (additionally, w/ or w/o hard labels)--> |
| 94 | +- <span style="color: #ffff00;">[To Verify]:</span>Extensive configs provided; |
| 95 | +- <span style="color: #ffff00;">[To Verify]:</span>Cutomized configs; |
| 96 | +- <span style="color: #ffff00;">[To Verify]:</span>Testing and training framework with integrated metrics. |
| 97 | + |
| 98 | +## Coming Soon |
| 99 | +<span style="color: #ffff00;">[To Verify]:</span>Rank on different data augmentation methods. |
| 100 | +<span style="color: #ffff00;">[To Verify]:</span>Rank on different data augmentation methods. |
| 101 | +## Tutorial |
| 102 | + |
| 103 | +Install DD-Ranking with `pip` or from [source](https://github.com/NUS-HPC-AI-Lab/DD-Ranking/tree/main): |
| 104 | + |
| 105 | +### Installation |
| 106 | + |
| 107 | +From pip |
| 108 | + |
| 109 | +```bash |
| 110 | +pip install dd_ranking |
| 111 | +``` |
| 112 | + |
| 113 | +From source |
| 114 | + |
| 115 | +```bash |
| 116 | +python setup.py install |
| 117 | +``` |
| 118 | +### Quickstart |
| 119 | + |
| 120 | +Below is a step-by-step guide on how to use our `dd_ranking`. This demo is based on soft labels (source code can be found in `demo_soft.py`). You can find hard label demo in `demo_hard.py`. |
| 121 | + |
| 122 | +**Step1**: Intialize a soft-label metric evaluator object. Config files are recommended for users to specify hyper-parameters. Sample config files are provided [here](https://github.com/NUS-HPC-AI-Lab/DD-Ranking/tree/main/configs). |
| 123 | + |
| 124 | +```python |
| 125 | +from dd_ranking.metrics import Soft_Label_Objective_Metrics |
| 126 | +from dd_ranking.config import Config |
| 127 | + |
| 128 | +config = Config.from_file("./configs/Demo_Soft_Label.yaml") |
| 129 | +soft_obj = Soft_Label_Objective_Metrics(config) |
| 130 | +``` |
| 131 | + |
| 132 | +<details> |
| 133 | +<summary>You can also pass keyword arguments.</summary> |
| 134 | + |
| 135 | +```python |
| 136 | +device = "cuda" |
| 137 | +method_name = "DATM" # Specify your method name |
| 138 | +ipc = 10 # Specify your IPC |
| 139 | +dataset = "CIFAR10" # Specify your dataset name |
| 140 | +syn_data_dir = "./data/CIFAR10/IPC10/" # Specify your synthetic data path |
| 141 | +real_data_dir = "./datasets" # Specify your dataset path |
| 142 | +model_name = "ConvNet-3" # Specify your model name |
| 143 | +teacher_dir = "./teacher_models" # Specify your path to teacher model chcekpoints |
| 144 | +im_size = (32, 32) # Specify your image size |
| 145 | +dsa_params = { # Specify your data augmentation parameters |
| 146 | + "prob_flip": 0.5, |
| 147 | + "ratio_rotate": 15.0, |
| 148 | + "saturation": 2.0, |
| 149 | + "brightness": 1.0, |
| 150 | + "contrast": 0.5, |
| 151 | + "ratio_scale": 1.2, |
| 152 | + "ratio_crop_pad": 0.125, |
| 153 | + "ratio_cutout": 0.5 |
| 154 | +} |
| 155 | +save_path = f"./results/{dataset}/{model_name}/IPC{ipc}/dm_hard_scores.csv" |
| 156 | + |
| 157 | +""" We only list arguments that usually need specifying""" |
| 158 | +soft_label_metric_calc = Soft_Label_Objective_Metrics( |
| 159 | + dataset=dataset, |
| 160 | + real_data_path=real_data_dir, |
| 161 | + ipc=ipc, |
| 162 | + model_name=model_name, |
| 163 | + soft_label_criterion='sce', # Use Soft Cross Entropy Loss |
| 164 | + soft_label_mode='S', # Use one-to-one image to soft label mapping |
| 165 | + data_aug_func='dsa', # Use DSA data augmentation |
| 166 | + aug_params=dsa_params, # Specify dsa parameters |
| 167 | + im_size=im_size, |
| 168 | + stu_use_torchvision=False, |
| 169 | + tea_use_torchvision=False, |
| 170 | + teacher_dir='./teacher_models', |
| 171 | + device=device, |
| 172 | + save_path=save_path |
| 173 | +) |
| 174 | +``` |
| 175 | +</details> |
| 176 | + |
| 177 | +For detailed explanation for hyper-parameters, please refer to our <a href="">documentation</a>. |
| 178 | + |
| 179 | +**Step 2:** Load your synthetic data, labels (if any), and learning rate (if any). |
| 180 | + |
| 181 | +```python |
| 182 | +syn_images = torch.load('/your/path/to/syn/images.pt') |
| 183 | +# You must specify your soft labels if your soft label mode is 'S' |
| 184 | +soft_labels = torch.load('/your/path/to/syn/labels.pt') |
| 185 | +syn_lr = torch.load('/your/path/to/syn/lr.pt') |
| 186 | +``` |
| 187 | + |
| 188 | +**Step 3:** Compute the xxx metric. |
| 189 | + |
| 190 | +```python |
| 191 | +metric = soft_label_metric_calc.compute_metrics(syn_images, soft_labels=soft_labels, syn_lr=syn_lr) |
| 192 | +``` |
| 193 | + |
| 194 | +The following results will be returned to you: |
| 195 | +- `HLR mean`: The mean of hard label recovery over `num_eval` runs. |
| 196 | +- `HLR std`: The standard deviation of hard label recovery over `num_eval` runs. |
| 197 | +- `IOR mean`: The mean of improvement over random over `num_eval` runs. |
| 198 | +- `IOR std`: The standard deviation of improvement over random over `num_eval` runs. |
| 199 | +- `IOR/HLR mean`: The mean of IOR/HLR over `num_eval` runs. |
| 200 | +- `IOR/HLR std`: The standard deviation of IOR/HLR over `num_eval` runs. |
| 201 | + |
| 202 | +<!-- Our <span style="color: #ff0000;">[TODO]:</span>[documentation]() to learn more. |
| 203 | +
|
| 204 | +- [Installation]() |
| 205 | +- [Quickstart]() |
| 206 | +- [Supported Models]() --> |
| 207 | + |
| 208 | +## Contributing |
| 209 | + |
| 210 | + |
| 211 | +<!-- Only PR for the 1st version of DD-Ranking --> |
| 212 | +Feel free to submit grades to update the DD-Ranking list. We welcome and value any contributions and collaborations. |
| 213 | +Please check out [CONTRIBUTING.md](./CONTRIBUTING.md) for how to get involved. |
| 214 | + |
| 215 | +<!-- ## Acknowledgement |
| 216 | +
|
| 217 | +DD-Ranking is a community project. The compute resources for development and testing are supported by the following organizations. Thanks for your support! --> |
| 218 | + |
| 219 | +<!-- Note: Please sort them in alphabetical order. --> |
| 220 | +<!-- Note: Please keep these consistent with docs/source/community/sponsors.md --> |
| 221 | + |
| 222 | +<!-- - First Org. |
| 223 | +
|
| 224 | +We also have an official fundraising venue through <span style="color: #ff0000;">[TODO]:</span>[the collection website](). We plan to use the fund to support the development, maintenance, and adoption of DD-Ranking. --> |
| 225 | + |
| 226 | +<!-- Paper to be added --> |
| 227 | +<!-- If a pre-print is wanted, a digital asset could be released first. --> |
| 228 | + |
| 229 | +<!-- ## Citation |
| 230 | +
|
| 231 | +If you use DD-Ranking for your research, please cite our [paper](): |
| 232 | +```bibtex |
| 233 | +@inproceedings{, |
| 234 | + title={DD-Ranking: }, |
| 235 | + author={}, |
| 236 | + booktitle={}, |
| 237 | + year={2024} |
| 238 | +} |
| 239 | +``` |
| 240 | +
|
| 241 | +<!-- ## Contact Us |
8 | 242 |
|
9 | | -However, we notice that DD lacks a unified and fair evaluation benchmark. Issues of current evaluation scheme are summarized as follows: |
| 243 | +**Community Discussions**: Engage with other users on <span style="color: #ff0000;">[TODO]:</span>[Discord]() for discussions. |
10 | 244 |
|
11 | | -1. Results of using hard and soft labels are not directly comparable. The essence of using soft labels is **knowledge distillation**. When introducing teacher knowledge to train a model on the distilled dataset, the obtained test accuracy may not fully reflect the pure informativeness of the distilled data. |
12 | | -2. Strategies of using soft labels are diverse. We have seen different objective functions during evaluation, such as soft Cross-Entropy and Kullback–Leibler divergence. Since the objective function to use soft labels is usually not a contribution of most methods, it is not fair to compare different methods with different soft label strategies. |
13 | | -3. Data augmentations on distilled datasets are diverse. Different methods may adopt different data augmentations to enhance the model training, which could improve the test accuracy to different extents. Thus, data augmentation should also be properly aligned. |
| 245 | +**Coordination of Contributions and Development**: Use <span style="color: #ff0000;">[TODO]:</span>[Slack]() for coordinating contributions and managing development efforts. |
14 | 246 |
|
15 | | -With above issues, we point out the major limitation of the current DD evaluation as follows: **the test accuracy of the model trained on distilled data does not equal to the real informativeness of the distilled data.** |
| 247 | +**Collaborations and Partnerships**: For exploring collaborations or partnerships, reach out via <span style="color: #ff0000;">[TODO]:</span>[email](). |
16 | 248 |
|
17 | | -## DD-Ranking |
| 249 | +**Technical Queries and Feature Requests**: Utilize GitHub issues or discussions for addressing technical questions and proposing new features. |
18 | 250 |
|
| 251 | +**Security Disclosures**: Report security vulnerabilities through GitHub's security advisory feature. --> |
0 commit comments