Skip to content

Commit 73b36e8

Browse files
author
hanjian.thu123
committed
2 parents d542c51 + 577c590 commit 73b36e8

File tree

4 files changed

+164
-2
lines changed

4 files changed

+164
-2
lines changed

DockerFile

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
FROM pytorch/pytorch:2.5.1-cuda11.8-cudnn9-devel
2+
3+
ENV DEBIAN_FRONTEND=noninteractive \
4+
PYTHONUNBUFFERED=1 \
5+
CUDA_HOME=/usr/local/cuda \
6+
PATH="$CUDA_HOME/bin:$PATH"
7+
8+
RUN apt-get update && apt-get install -y --no-install-recommends \
9+
git \
10+
curl \
11+
ffmpeg \
12+
libsm6 \
13+
libxext6 \
14+
&& apt-get clean && rm -rf /var/lib/apt/lists/*
15+
16+
WORKDIR /workspace/
17+
18+
COPY requirements.txt /workspace/requirements.txt
19+
20+
RUN pip install --upgrade pip \
21+
&& pip install ninja \
22+
&& MAX_JOBS=1 pip install flash-attn --no-build-isolation \
23+
&& pip install -r requirements.txt \
24+
&& pip install opencv-fixer==0.2.5 \
25+
&& python -c "from opencv_fixer import AutoFix; AutoFix()"
26+
27+
CMD ["/bin/bash"]

README.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
## 🔥 Updates!!
2323
* Dec 24, 2024: 🔥 Training and Testing Codes && Checkpoints && Demo released!
2424
* Dec 12, 2024: 💻 Add Project Page
25+
* Dec 10, 2024: 🏆 Visual AutoRegressive Modeling received NeurIPS 2024 Best Paper Award.
2526
* Dec 5, 2024: 🤗 Paper release
2627

2728
## 🕹️ Try and Play with Infinity!
@@ -166,7 +167,28 @@ Fine-tuning Infinity is quite simple where you only need to append ```--rush_res
166167

167168
After fine-tuning, you will get a checkpoint like [model_dir]/ar-ckpt-giter(xxx)K-ep(xxx)-iter(xxx)-last.pth. Note that this checkpoint cotains training states besides model weights. Inference with this model should enable ```--enable_model_cache=1``` in [eval.sh](scripts/eval.sh) or [interactive_infer.ipynb](tools/interactive_infer.ipynb).
168169

170+
## Use Docker
169171

172+
If you are interested in reproducing the paper model locally (inference only) you can refer to our Docker container. This one-stop approach is especially suitable for people with no background knowledge.
173+
174+
### 1. Download weights
175+
176+
Download `flan-t5-xl` folder, `infinity_2b_reg.pth` and `infinity_vae_d32reg.pth` files to weights folder.
177+
178+
### 2. Build Docker container
179+
180+
```
181+
docker build -t my-flash-attn-env .
182+
docker run --gpus all -it --name my-container -v {your-local-path}:/workspace my-flash-attn-env
183+
```
184+
185+
### 3. Run
186+
187+
```
188+
python Infinity/tools/reproduce.py
189+
```
190+
191+
Note: You can also use your own prompts, just modify the prompt in `reproduce.py`.
170192

171193
## One More Thing: Infinity-20B is coming soon 📆
172194
Infinity shows strong scaling capabilities as illustrated before. Thus we are encouraged to continue to scale up the model size to 20B. Here we present the side-by-side comparison results between Infinity-2B and Infinity-20B.
@@ -186,7 +208,7 @@ Currently, Infinity-20B is still on the training phrase. We will release Infinit
186208
If our work assists your research, feel free to give us a star ⭐ or cite us using:
187209

188210
```
189-
@misc{han2024infinityscalingbitwiseautoregressive,
211+
@misc{Infinity,
190212
title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis},
191213
author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu},
192214
year={2024},
@@ -197,5 +219,17 @@ If our work assists your research, feel free to give us a star ⭐ or cite us us
197219
}
198220
```
199221

222+
```
223+
@misc{VAR,
224+
title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction},
225+
author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},
226+
year={2024},
227+
eprint={2404.02905},
228+
archivePrefix={arXiv},
229+
primaryClass={cs.CV},
230+
url={https://arxiv.org/abs/2404.02905},
231+
}
232+
```
233+
200234
## License
201235
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

infinity/models/infinity.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
from torch.utils.checkpoint import checkpoint
1717
from PIL import Image
1818
import numpy as np
19-
from torch.nn.attention.flex_attention import flex_attention
19+
# from torch.nn.attention.flex_attention import flex_attention
2020

2121
import infinity.utils.dist as dist
2222
from infinity.utils.dist import for_visualize

tools/reproduce.py

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
import random
2+
import torch
3+
import os
4+
import os.path as osp
5+
import cv2
6+
import numpy as np
7+
from run_infinity import *
8+
9+
torch.cuda.set_device(0)
10+
model_path = '/workspace/Infinity/weights/infinity_2b_reg.pth'
11+
vae_path = '/workspace/Infinity/weights/infinity_vae_d32reg.pth'
12+
text_encoder_ckpt = '/workspace/Infinity/weights/flan-t5-xl'
13+
14+
# SET
15+
args = argparse.Namespace(
16+
pn='1M',
17+
model_path=model_path,
18+
cfg_insertion_layer=0,
19+
vae_type=32,
20+
vae_path=vae_path,
21+
add_lvl_embeding_only_first_block=1,
22+
use_bit_label=1,
23+
model_type='infinity_2b',
24+
rope2d_each_sa_layer=1,
25+
rope2d_normalized_by_hw=2,
26+
use_scale_schedule_embedding=0,
27+
sampling_per_bits=1,
28+
text_encoder_ckpt=text_encoder_ckpt,
29+
text_channels=2048,
30+
apply_spatial_patchify=0,
31+
h_div_w_template=1.000,
32+
use_flex_attn=0,
33+
cache_dir='/dev/shm',
34+
checkpoint_type='torch',
35+
seed=0,
36+
bf16=1,
37+
save_file='tmp.jpg',
38+
enable_model_cache=0
39+
)
40+
41+
# LOAD
42+
text_tokenizer, text_encoder = load_tokenizer(t5_path=args.text_encoder_ckpt)
43+
vae = load_visual_tokenizer(args)
44+
infinity = load_transformer(vae, args)
45+
46+
# PROMPT
47+
prompts = {
48+
"vintage_insect": "Insect made from vintage 1960s electronic components, capacitors, resistors, transistors, wires, diodes, solder, circuitboard.",
49+
"macro_closeup": "Denis Villeneuve's extreme macro cinematographic close-up in water.",
50+
"3d_school": "A creative 3D image to be placed at the bottom of a mobile application's homepage, depicting a miniature school and children carrying backpacks.",
51+
"explore_more": "Create an image with 'Explore More' in an adventurous font over a picturesque hiking trail.",
52+
"toy_car": "Close-up shot of a diecast toy car, diorama, night, lights from windows, bokeh, snow.",
53+
"fairy_house": "House: white; pink tinted windows; surrounded by flowers; cute; scenic; garden; fairy-like; epic; photography; photorealistic; insanely detailed and intricate; textures; grain; ultra-realistic.",
54+
"cat_fashion": "Hyperrealistic black and white photography of cats fashion show in style of Helmut Newton.",
55+
"spacefrog_astroduck": "Two superheroes called Spacefrog (a dashing green cartoon-like frog with a red cape) and Astroduck (a yellow fuzzy duck, part-robot, with blue/grey armor), near a garden pond, next to their spaceship, a classic flying saucer, called the Tadpole 3000. Photorealistic.",
56+
"miniature_village": "An enchanted miniature village bustling with activity, featuring tiny houses, markets, and residents.",
57+
"corgi_dog": "A close-up photograph of a Corgi dog. The dog is wearing a black hat and round, dark sunglasses. The Corgi has a joyful expression, with its mouth open and tongue sticking out, giving an impression of happiness or excitement.",
58+
"robot_eggplant": "a robot holding a huge eggplant, sunny nature background",
59+
"perfume_product": "Product photography, a perfume placed on a white marble table with pineapple, coconut, lime next to it as decoration, white curtains, full of intricate details, realistic, minimalist, layered gestures in a bright and concise atmosphere, minimalist style.",
60+
"mountain_landscape": "The image presents a picturesque mountainous landscape under a cloudy sky. The mountains, blanketed in lush greenery, rise majestically, their slopes dotted with clusters of trees and shrubs. The sky above is a canvas of blue, adorned with fluffy white clouds that add a sense of tranquility to the scene. In the foreground, a valley unfolds, nestled between the towering mountains. It appears to be a rural area, with a few buildings and structures visible, suggesting the presence of a small settlement. The buildings are scattered, blending harmoniously with the natural surroundings. The image is captured from a high vantage point, providing a sweeping view of the valley and the mountains."
61+
}
62+
63+
# OUTPUT
64+
output_dir = "outputs"
65+
os.makedirs(output_dir, exist_ok=True)
66+
67+
# GEN IMG
68+
for category, prompt in prompts.items():
69+
cfg = 3
70+
tau = 0.5
71+
h_div_w = 1/1 # Aspect Ratio
72+
seed = random.randint(0, 10000)
73+
enable_positive_prompt = 0
74+
75+
h_div_w_template_ = h_div_w_templates[np.argmin(np.abs(h_div_w_templates-h_div_w))]
76+
scale_schedule = dynamic_resolution_h_w[h_div_w_template_][args.pn]['scales']
77+
scale_schedule = [(1, h, w) for (_, h, w) in scale_schedule]
78+
79+
# GEN
80+
generated_image = gen_one_img(
81+
infinity,
82+
vae,
83+
text_tokenizer,
84+
text_encoder,
85+
prompt,
86+
g_seed=seed,
87+
gt_leak=0,
88+
gt_ls_Bl=None,
89+
cfg_list=cfg,
90+
tau_list=tau,
91+
scale_schedule=scale_schedule,
92+
cfg_insertion_layer=[args.cfg_insertion_layer],
93+
vae_type=args.vae_type,
94+
sampling_per_bits=args.sampling_per_bits,
95+
enable_positive_prompt=enable_positive_prompt,
96+
)
97+
98+
# SAVE
99+
save_path = osp.join(output_dir, f"re_{category}_test.jpg")
100+
cv2.imwrite(save_path, generated_image.cpu().numpy())
101+
print(f"{category} image saved to {save_path}")

0 commit comments

Comments
 (0)