You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-8Lines changed: 11 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,7 +67,7 @@ Bitwise Self-Correction✨: Teacher-forcing training in AR brings severe train-t
67
67
<imgsrc="assets/scaling_models.png"width=95%>
68
68
<p>
69
69
70
-
## Infinity Model ZOO
70
+
## 🏘 Infinity Model ZOO
71
71
We provide Infinity models for you to play with, which are on <ahref='https://huggingface.co/FoundationVision/infinity'><imgsrc='https://img.shields.io/badge/%F0%9F%A4%97%20weights-FoundationVision/Infinity-yellow'></a> or can be downloaded from the following links:
72
72
73
73
### Visual Tokenizer
@@ -91,11 +91,11 @@ ${\dagger}$ result is tested with a [prompt rewriter](tools/prompt_rewriter.py).
91
91
You can load these models to generate images via the codes in [interactive_infer.ipynb](tools/interactive_infer.ipynb). Note: you need to download [infinity_vae_d32reg.pth](https://huggingface.co/FoundationVision/Infinity/blob/main/infinity_vae_d32reg.pth) and [flan-t5-xl](https://huggingface.co/google/flan-t5-xl) first.
92
92
93
93
94
-
## Installation
94
+
## ⚽️ Installation
95
95
1. We use FlexAttention to speedup training, which requires `torch>=2.5.1`.
96
96
2. Install other pip packages via `pip3 install -r requirements.txt`.
97
97
98
-
## Data Preparation
98
+
## 🎨 Data Preparation
99
99
The structure of the training dataset is listed as bellow. The training dataset contains a list of json files with name "[h_div_w_template1]_[num_examples].jsonl". Here [h_div_w_template] is a float number, which is the template ratio of height to width of the image. [num_examples] is the number of examples where $h/w$ is around h_div_w_template. [dataset_t2i_iterable.py](infinity/dataset/dataset_t2i_iterable.py) supports traing with >100M examples. But we have to specify the number of examples for each h/w template ratio in the filename.
100
100
101
101
```
@@ -120,7 +120,7 @@ Each "[h_div_w_template1]_[num_examples].jsonl" file contains lines of dumped js
120
120
Still have questions about the data preparation? Easy, we have provided a toy dataset with 10 images. You can prepare your dataset by referring [this](data/infinity_toy_data).
121
121
122
122
123
-
## Training Scripts
123
+
## 🧁 Training Scripts
124
124
We provide [train.sh](scripts/train.sh) for train Infinity-2B with one command
125
125
```shell
126
126
bash scripts/train.sh
@@ -149,21 +149,24 @@ You can monitor the training process by checking the logs in `local_output/log.t
149
149
150
150
If your experiment is interrupted, just rerun the command, and the training will **automatically resume** from the last checkpoint in `local_output/ckpt*.pth`.
151
151
152
-
## Evaluation
152
+
## 🍭 Evaluation
153
153
We provide [eval.sh](scripts/eval.sh) for evaluation on various benchmarks with only one command. In particular, [eval.sh](scripts/eval.sh) supports evaluation on commonly used metrics such as [GenEval](https://github.com/djghosh13/geneval), [ImageReward](https://github.com/THUDM/ImageReward), [HPSv2.1](https://github.com/tgxs002/HPSv2), FID and Validation Loss. Please refer to [evaluation/README.md](evaluation/README.md) for more details.
154
154
```shell
155
155
bash scripts/eval.sh
156
156
```
157
157
158
-
## Fine-tuning
159
-
Fine-tuning Infinity is quite simple where you only need append ```--rush_resume=[infinity_2b_reg.pth]``` to [train.sh](scripts/train.sh). Note that you have to carefully set ```--pn``` for training and inference code since it decides the resolution of fine-tuning.
158
+
## ✨ Fine-Tuning
159
+
Fine-tuning Infinity is quite simple where you only need to append ```--rush_resume=[infinity_2b_reg.pth]``` to [train.sh](scripts/train.sh). Note that you have to carefully set ```--pn``` for training and inference code since it decides the resolution of images.
160
160
161
161
```
162
162
--pn=0.06M # 256x256 resolution (including other aspect ratios with same number of pixels)
163
163
--pn=0.25M # 512x512 resolution
164
164
--pn=1M # 1024x1024 resolution
165
165
```
166
166
167
+
After fine-tuning, you will get a checkpoint like [model_dir]/ar-ckpt-giter(xxx)K-ep(xxx)-iter(xxx)-last.pth. Note that this checkpoint cotains training states besides model weights. Inference with this model should enable ```--enable_model_cache=1``` in [eval.sh](scripts/eval.sh) or [interactive_infer.ipynb](tools/interactive_infer.ipynb).
168
+
169
+
167
170
168
171
## One More Thing: Infinity-20B is coming soon 📆
169
172
Infinity shows strong scaling capabilities as illustrated before. Thus we are encouraged to continue to scale up the model size to 20B. Here we present the side-by-side comparison results between Infinity-2B and Infinity-20B.
@@ -179,7 +182,7 @@ Infinity shows strong scaling capabilities as illustrated before. Thus we are en
179
182
180
183
Currently, Infinity-20B is still on the training phrase. We will release Infinity-20B once the training is completed.
181
184
182
-
## Citation
185
+
## 📖 Citation
183
186
If our work assists your research, feel free to give us a star ⭐ or cite us using:
0 commit comments