Skip to content

Commit 5345702

Browse files
committed
[UPDATE] Revise README and example code for AnyTextPipeline integration with DiffusionPipeline
1 parent 0fc4aab commit 5345702

File tree

2 files changed

+33
-36
lines changed

2 files changed

+33
-36
lines changed
Lines changed: 24 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,43 @@
11
# AnyTextPipeline Pipeline
22

3-
From the project [page](https://zhendong-wang.github.io/prompt-diffusion.github.io/)
3+
From the repo [page](https://github.com/tyxsspa/AnyText)
44

5-
"With a prompt consisting of a task-specific example pair of images and text guidance, and a new query image, Prompt Diffusion can comprehend the desired task and generate the corresponding output image on both seen (trained) and unseen (new) task types."
5+
"AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy."
66

7-
For any usage questions, please refer to the [paper](https://arxiv.org/abs/2305.01115).
8-
9-
Prepare models by converting them from the [checkpoint](https://huggingface.co/zhendongw/prompt-diffusion)
10-
11-
To convert the controlnet, use cldm_v15.yaml from the [repository](https://github.com/Zhendong-Wang/Prompt-Diffusion/tree/main/models/):
12-
13-
```sh
14-
python convert_original_anytext_to_diffusers.py --checkpoint_path path-to-network-step04999.ckpt --original_config_file path-to-cldm_v15.yaml --dump_path path-to-output-directory
15-
```
16-
17-
To learn about how to convert the fine-tuned stable diffusion model, see the [Load different Stable Diffusion formats guide](https://huggingface.co/docs/diffusers/main/en/using-diffusers/other-formats).
7+
For any usage questions, please refer to the [paper](https://arxiv.org/abs/2311.03054).
188

199

2010
```py
2111
import torch
22-
from pipeline_anytext import AnyTextPipeline
23-
from text_controlnet import AnyTextControlNetModel
12+
from diffusers import DiffusionPipeline
13+
from anytext_controlnet import AnyTextControlNetModel
2414
from diffusers import DDIMScheduler
2515
from diffusers.utils import load_image
2616

2717

28-
controlnet = AnyTextControlNetModel.from_pretrained("tolgacangoz/anytext-controlnet", torch_dtype=torch.float16,
29-
variant="fp16")
30-
pipe = AnyTextPipeline.from_pretrained("tolgacangoz/anytext", controlnet=controlnet,
31-
torch_dtype=torch.float16, variant="fp16")
18+
# I chose a font file shared by an HF staff:
19+
!wget https://huggingface.co/spaces/ysharma/TranslateQuotesInImageForwards/resolve/main/arial-unicode-ms.ttf
20+
21+
# load control net and stable diffusion v1-5
22+
anytext_controlnet = AnyTextControlNetModel.from_pretrained("tolgacangoz/anytext-controlnet", torch_dtype=torch.float16,
23+
variant="fp16",)
24+
pipe = DiffusionPipeline.from_pretrained("tolgacangoz/anytext", font_path="arial-unicode-ms.ttf",
25+
controlnet=anytext_controlnet, torch_dtype=torch.float16,
26+
trust_remote_code=True,
27+
).to("cuda")
3228

33-
# speed up diffusion process with faster scheduler and memory optimization
3429
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
35-
# uncomment following line if torch<2.0
30+
# uncomment following line if PyTorch>=2.0 is not installed for memory optimization
3631
#pipe.enable_xformers_memory_efficient_attention()
37-
pipe.enable_model_cpu_offload()
32+
33+
# uncomment following line if you want to offload the model to CPU for memory optimization
34+
# also remove the `.to("cuda")` part
35+
#pipe.enable_model_cpu_offload()
36+
3837
# generate image
39-
generator = torch.Generator("cpu").manual_seed(66273235)
4038
prompt = 'photo of caramel macchiato coffee on the table, top-down perspective, with "Any" "Text" written on it using cream'
41-
draw_pos = load_image("www.huggingface.co/a/AnyText/tree/main/examples/gen9.png")
42-
image = pipe(prompt, num_inference_steps=20, generator=generator, mode="generate", draw_pos=draw_pos,
43-
).images[0]
39+
draw_pos = load_image("https://raw.githubusercontent.com/tyxsspa/AnyText/refs/heads/main/example_images/gen9.png")
40+
image = pipe(prompt, num_inference_steps=20, mode="generate", draw_pos=draw_pos,
41+
).images[0]
4442
image
4543
```

examples/research_projects/anytext/anytext.py

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -81,18 +81,19 @@
8181
EXAMPLE_DOC_STRING = """
8282
Examples:
8383
```py
84-
>>> from pipeline_anytext import AnyTextPipeline
84+
>>> from diffusers import DiffusionPipeline
8585
>>> from anytext_controlnet import AnyTextControlNetModel
8686
>>> from diffusers import DDIMScheduler
8787
>>> from diffusers.utils import load_image
8888
>>> import torch
8989
9090
>>> # load control net and stable diffusion v1-5
91-
>>> text_controlnet = AnyTextControlNetModel.from_pretrained("tolgacangoz/anytext-controlnet", torch_dtype=torch.float16,
92-
... variant="fp16",)
93-
>>> pipe = AnyTextPipeline.from_pretrained("tolgacangoz/anytext", controlnet=text_controlnet,
94-
... torch_dtype=torch.float16, variant="fp16",
95-
... ).to("cuda")
91+
>>> anytext_controlnet = AnyTextControlNetModel.from_pretrained("tolgacangoz/anytext-controlnet", torch_dtype=torch.float16,
92+
... variant="fp16",)
93+
>>> pipe = DiffusionPipeline.from_pretrained("tolgacangoz/anytext", font_path="Arial_Unicode2.ttf",
94+
... controlnet=anytext_controlnet, torch_dtype=torch.float16,
95+
... trust_remote_code=True,
96+
... ).to("cuda")
9697
9798
>>> pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
9899
>>> # uncomment following line if PyTorch>=2.0 is not installed for memory optimization
@@ -103,11 +104,9 @@
103104
>>> #pipe.enable_model_cpu_offload()
104105
105106
>>> # generate image
106-
>>> generator = torch.Generator("cpu").manual_seed(66273235)
107107
>>> prompt = 'photo of caramel macchiato coffee on the table, top-down perspective, with "Any" "Text" written on it using cream'
108-
>>> draw_pos = load_image("www.huggingface.co/a/AnyText/tree/main/examples/gen9.png")
109-
>>> image = pipe(prompt, num_inference_steps=20, generator=generator, mode="generate",
110-
... draw_pos=draw_pos,
108+
>>> draw_pos = load_image("https://raw.githubusercontent.com/tyxsspa/AnyText/refs/heads/main/example_images/gen9.png")
109+
>>> image = pipe(prompt, num_inference_steps=20, mode="generate", draw_pos=draw_pos,
111110
... ).images[0]
112111
>>> image
113112
```

0 commit comments

Comments
 (0)