Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
8103ca2
Add LongCat-Image
junqiangwu Dec 11, 2025
07a78ce
Merge branch 'main' into main
junqiangwu Dec 12, 2025
792746b
Update src/diffusers/models/transformers/transformer_longcat_image.py
junqiangwu Dec 12, 2025
898902c
Update src/diffusers/models/transformers/transformer_longcat_image.py
junqiangwu Dec 12, 2025
87dfd68
Update src/diffusers/models/transformers/transformer_longcat_image.py
junqiangwu Dec 12, 2025
600cd5c
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py
junqiangwu Dec 12, 2025
a8d35f4
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py
junqiangwu Dec 12, 2025
d95fb05
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py
junqiangwu Dec 12, 2025
40714c9
Update src/diffusers/models/transformers/transformer_longcat_image.py
junqiangwu Dec 12, 2025
0c6ee77
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py
junqiangwu Dec 12, 2025
32032b3
fix code
junqiangwu Dec 12, 2025
dbdbd01
add doc
junqiangwu Dec 12, 2025
110698b
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image_e…
junqiangwu Dec 12, 2025
833f275
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image_e…
junqiangwu Dec 12, 2025
b5858c4
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py
junqiangwu Dec 12, 2025
fcfdda9
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py
junqiangwu Dec 12, 2025
5e2f36c
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py
junqiangwu Dec 12, 2025
9f81035
Update src/diffusers/pipelines/longcat_image/pipeline_longcat_image.py
junqiangwu Dec 12, 2025
4e92dc6
fix code & mask style & fix-copies
junqiangwu Dec 12, 2025
b43ba6b
Apply style fixes
github-actions[bot] Dec 15, 2025
475f03d
Merge branch 'main' into main
yiyixuxu Dec 15, 2025
6669415
fix single input rewrite error
Dec 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions docs/source/en/api/models/longcat_image_transformer2d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# LongCatImageTransformer2DModel

The model can be loaded with the following code snippet.

```python
from diffusers import LongCatImageTransformer2DModel

transformer = LongCatImageTransformer2DModel.from_pretrained("meituan-longcat/LongCat-Image ", subfolder="transformer", torch_dtype=torch.bfloat16)
```

## LongCatImageTransformer2DModel

[[autodoc]] LongCatImageTransformer2DModel
114 changes: 114 additions & 0 deletions docs/source/en/api/pipelines/longcat_image.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# LongCat-Image

<div class="flex flex-wrap space-x-1">
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
</div>


We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models.


### Key Features
- 🌟 **Exceptional Efficiency and Performance**: With only **6B parameters**, LongCat-Image surpasses numerous open-source models that are several times larger across multiple benchmarks, demonstrating the immense potential of efficient model design.
- 🌟 **Superior Editing Performance**: LongCat-Image-Edit model achieves state-of-the-art performance among open-source models, delivering leading instruction-following and image quality with superior visual consistency.
- 🌟 **Powerful Chinese Text Rendering**: LongCat-Image demonstrates superior accuracy and stability in rendering common Chinese characters compared to existing SOTA open-source models and achieves industry-leading coverage of the Chinese dictionary.
- 🌟 **Remarkable Photorealism**: Through an innovative data strategy and training framework, LongCat-Image achieves remarkable photorealism in generated images.
- 🌟 **Comprehensive Open-Source Ecosystem**: We provide a complete toolchain, from intermediate checkpoints to full training code, significantly lowering the barrier for further research and development.

For more details, please refer to the comprehensive [***LongCat-Image Technical Report***](https://arxiv.org/abs/2412.11963)


## Usage Example

```py
import torch
import diffusers
from diffusers import LongCatImagePipeline

weight_dtype = torch.bfloat16
pipe = LongCatImagePipeline.from_pretrained("meituan-longcat/LongCat-Image", torch_dtype=torch.bfloat16 )
pipe.to('cuda')
# pipe.enable_model_cpu_offload()

prompt = '一个年轻的亚裔女性,身穿黄色针织衫,搭配白色项链。她的双手放在膝盖上,表情恬静。背景是一堵粗糙的砖墙,午后的阳光温暖地洒在她身上,营造出一种宁静而温馨的氛围。镜头采用中距离视角,突出她的神态和服饰的细节。光线柔和地打在她的脸上,强调她的五官和饰品的质感,增加画面的层次感与亲和力。整个画面构图简洁,砖墙的纹理与阳光的光影效果相得益彰,突显出人物的优雅与从容。'
image = pipe(
prompt,
height=768,
width=1344,
guidance_scale=4.0,
num_inference_steps=50,
num_images_per_prompt=1,
generator=torch.Generator("cpu").manual_seed(43),
enable_cfg_renorm=True,
enable_prompt_rewrite=True,
).images[0]
image.save(f'./longcat_image_t2i_example.png')
```


This pipeline was contributed by LongCat-Image Team. The original codebase can be found [here](https://github.com/meituan-longcat/LongCat-Image).

Available models:
<div style="overflow-x: auto; margin-bottom: 16px;">
<table style="border-collapse: collapse; width: 100%;">
<thead>
<tr>
<th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Models</th>
<th style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Type</th>
<th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Description</th>
<th style="padding: 8px; border: 1px solid #d0d7de; background-color: #f6f8fa;">Download Link</th>
</tr>
</thead>
<tbody>
<tr>
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">LongCat&#8209;Image</td>
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Text&#8209;to&#8209;Image</td>
<td style="padding: 8px; border: 1px solid #d0d7de;">Final Release. The standard model for out&#8209;of&#8209;the&#8209;box inference.</td>
<td style="padding: 8px; border: 1px solid #d0d7de;">
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/meituan-longcat/LongCat-Image">Huggingface</a></span>
</td>
</tr>
<tr>
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">LongCat&#8209;Image&#8209;Dev</td>
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Text&#8209;to&#8209;Image</td>
<td style="padding: 8px; border: 1px solid #d0d7de;">Development. Mid-training checkpoint, suitable for fine-tuning.</td>
<td style="padding: 8px; border: 1px solid #d0d7de;">
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/meituan-longcat/LongCat-Image-Dev">Huggingface</a></span>
</td>
</tr>
<tr>
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">LongCat&#8209;Image&#8209;Edit</td>
<td style="white-space: nowrap; padding: 8px; border: 1px solid #d0d7de;">Image Editing</td>
<td style="padding: 8px; border: 1px solid #d0d7de;">Specialized model for image editing.</td>
<td style="padding: 8px; border: 1px solid #d0d7de;">
<span style="white-space: nowrap;">🤗&nbsp;<a href="https://huggingface.co/meituan-longcat/LongCat-Image-Edit">Huggingface</a></span>
</td>
</tr>
</tbody>
</table>
</div>

## LongCatImagePipeline

[[autodoc]] LongCatImagePipeline
- all
- __call__

## LongCatImagePipelineOutput

[[autodoc]] pipelines.longcat_image.pipeline_output.LongCatImagePipelineOutput



6 changes: 6 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,7 @@
"LTXVideoTransformer3DModel",
"Lumina2Transformer2DModel",
"LuminaNextDiT2DModel",
"LongCatImageTransformer2DModel",
"MochiTransformer3DModel",
"ModelMixin",
"MotionAdapter",
Expand Down Expand Up @@ -541,6 +542,8 @@
"Lumina2Text2ImgPipeline",
"LuminaPipeline",
"LuminaText2ImgPipeline",
"LongCatImagePipeline",
"LongCatImageEditPipeline",
"MarigoldDepthPipeline",
"MarigoldIntrinsicsPipeline",
"MarigoldNormalsPipeline",
Expand Down Expand Up @@ -973,6 +976,7 @@
LTXVideoTransformer3DModel,
Lumina2Transformer2DModel,
LuminaNextDiT2DModel,
LongCatImageTransformer2DModel,
MochiTransformer3DModel,
ModelMixin,
MotionAdapter,
Expand Down Expand Up @@ -1241,6 +1245,8 @@
LTXImageToVideoPipeline,
LTXLatentUpsamplePipeline,
LTXPipeline,
LongCatImagePipeline,
LongCatImageEditPipeline,
LucyEditPipeline,
Lumina2Pipeline,
Lumina2Text2ImgPipeline,
Expand Down
2 changes: 2 additions & 0 deletions src/diffusers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@
_import_structure["transformers.transformer_kandinsky"] = ["Kandinsky5Transformer3DModel"]
_import_structure["transformers.transformer_ltx"] = ["LTXVideoTransformer3DModel"]
_import_structure["transformers.transformer_lumina2"] = ["Lumina2Transformer2DModel"]
_import_structure["transformers.transformer_longcat_image"] = ["LongCatImageTransformer2DModel"]
_import_structure["transformers.transformer_mochi"] = ["MochiTransformer3DModel"]
_import_structure["transformers.transformer_omnigen"] = ["OmniGenTransformer2DModel"]
_import_structure["transformers.transformer_ovis_image"] = ["OvisImageTransformer2DModel"]
Expand Down Expand Up @@ -211,6 +212,7 @@
LTXVideoTransformer3DModel,
Lumina2Transformer2DModel,
LuminaNextDiT2DModel,
LongCatImageTransformer2DModel,
MochiTransformer3DModel,
OmniGenTransformer2DModel,
OvisImageTransformer2DModel,
Expand Down
3 changes: 2 additions & 1 deletion src/diffusers/models/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
from .transformer_kandinsky import Kandinsky5Transformer3DModel
from .transformer_ltx import LTXVideoTransformer3DModel
from .transformer_lumina2 import Lumina2Transformer2DModel
from .transformer_longcat_image import LongCatImageTransformer2DModel
from .transformer_mochi import MochiTransformer3DModel
from .transformer_omnigen import OmniGenTransformer2DModel
from .transformer_ovis_image import OvisImageTransformer2DModel
Expand All @@ -47,4 +48,4 @@
from .transformer_wan import WanTransformer3DModel
from .transformer_wan_animate import WanAnimateTransformer3DModel
from .transformer_wan_vace import WanVACETransformer3DModel
from .transformer_z_image import ZImageTransformer2DModel
from .transformer_z_image import ZImageTransformer2DModel
Loading
Loading