Skip to content

Commit b2756ad

Browse files
authored
Merge branch 'main' into remote-vae-encode
2 parents 998c3c6 + b88fef4 commit b2756ad

File tree

21 files changed

+4230
-105
lines changed

21 files changed

+4230
-105
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# AnyTextPipeline Pipeline
2+
3+
Project page: https://aigcdesigngroup.github.io/homepage_anytext
4+
5+
"AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy."
6+
7+
Each text line that needs to be generated should be enclosed in double quotes. For any usage questions, please refer to the [paper](https://arxiv.org/abs/2311.03054).
8+
9+
10+
```py
11+
import torch
12+
from diffusers import DiffusionPipeline
13+
from anytext_controlnet import AnyTextControlNetModel
14+
from diffusers.utils import load_image
15+
16+
# I chose a font file shared by an HF staff:
17+
# !wget https://huggingface.co/spaces/ysharma/TranslateQuotesInImageForwards/resolve/main/arial-unicode-ms.ttf
18+
19+
anytext_controlnet = AnyTextControlNetModel.from_pretrained("tolgacangoz/anytext-controlnet", torch_dtype=torch.float16,
20+
variant="fp16",)
21+
pipe = DiffusionPipeline.from_pretrained("tolgacangoz/anytext", font_path="arial-unicode-ms.ttf",
22+
controlnet=anytext_controlnet, torch_dtype=torch.float16,
23+
trust_remote_code=False, # One needs to give permission to run this pipeline's code
24+
).to("cuda")
25+
26+
# generate image
27+
prompt = 'photo of caramel macchiato coffee on the table, top-down perspective, with "Any" "Text" written on it using cream'
28+
draw_pos = load_image("https://raw.githubusercontent.com/tyxsspa/AnyText/refs/heads/main/example_images/gen9.png")
29+
image = pipe(prompt, num_inference_steps=20, mode="generate", draw_pos=draw_pos,
30+
).images[0]
31+
image
32+
```

0 commit comments

Comments
 (0)