add Kandinsky 2.0 - the first multilingual text2image model #8990

0-NiK-0 · 2023-03-26T15:31:05Z

0-NiK-0
Mar 26, 2023

Ability to write Prompt in more than 100 languages.

Kandinsky 2.0
https://github.com/ai-forever/Kandinsky-2.0
https://huggingface.co/sberbank-ai/Kandinsky_2.0
https://fusionbrain.ai/diffusion

Model architecture:
It is a latent diffusion model with two multilingual text encoders:

mCLIP-XLMR 560M parameters
mT5-encoder-small 146M parameters
These encoders and multilingual training datasets unveil the real multilingual text-to-image generation experience!

Kandinsky 2.0 was trained on a large 1B multilingual set, including samples that we used to train Kandinsky.

In terms of diffusion architecture Kandinsky 2.0 implements UNet with 1.2B parameters.

Kandinsky 2.0 architecture overview:

0-NiK-0 · 2023-03-29T13:15:14Z

0-NiK-0
Mar 29, 2023
Author

Kandinsky 2.1 inherits best practicies from Dall-E 2 and Latent diffucion, while introducing some new ideas.

As text and image encoder it uses CLIP model and diffusion image prior (mapping) between latent spaces of CLIP modalities. This approach increases the visual performance of the model and unveils new horizons in blending images and text-guided image manipulation.

For diffusion mapping of latent spaces we use transformer with num_layers=20, num_heads=32 and hidden_size=2048.

Other architecture parts:

Text encoder (XLM-Roberta-Large-Vit-L-14) - 560M
Diffusion Image Prior — 1B
CLIP image encoder (ViT-L/14) - 427M
Latent Diffusion U-Net - 1.22B
MoVQ encoder/decoder - 67M
Kandinsky 2.1 was trained on a large-scale image-text dataset LAION HighRes and fine-tuned on our internal datasets.

0 replies

0-NiK-0 · 2023-04-04T18:05:45Z

0-NiK-0
Apr 4, 2023
Author

Kandinsky 2.1
https://github.com/ai-forever/Kandinsky-2
https://huggingface.co/ai-forever/Kandinsky_2.1
https://huggingface.co/spaces/ai-forever/Kandinsky2.1

Demo
https://fusionbrain.ai/
https://t.me/kandinsky21_bot
https://rudalle.ru/kandinsky2

Info
https://habr.com/ru/companies/sberbank/articles/725282/
https://www.reddit.com/r/StableDiffusion/comments/12bf5k2/kandinsky_21_beats_stable_diffusion_and_allows/

0 replies

bbecausereasonss · 2023-04-09T13:45:48Z

bbecausereasonss
Apr 9, 2023

Is there a way to train it and anyone know if it would work with Loras?

0 replies

user425846 · 2023-04-18T20:32:42Z

user425846
Apr 18, 2023

Is there any update on this?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add Kandinsky 2.0 - the first multilingual text2image model #8990

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

add Kandinsky 2.0 - the first multilingual text2image model #8990

Uh oh!

0-NiK-0 Mar 26, 2023

Replies: 4 comments

Uh oh!

0-NiK-0 Mar 29, 2023 Author

Uh oh!

Uh oh!

0-NiK-0 Apr 4, 2023 Author

Uh oh!

bbecausereasonss Apr 9, 2023

Uh oh!

user425846 Apr 18, 2023

0-NiK-0
Mar 26, 2023

0-NiK-0
Mar 29, 2023
Author

0-NiK-0
Apr 4, 2023
Author

bbecausereasonss
Apr 9, 2023

user425846
Apr 18, 2023