24 Dec 15:10

Fzilan

d246095

Latest

We're excited to announce the official release of MindOne v0.5.0, with enhanced community integration and significant performance improvements.

🚀 Key Highlights

mindone.diffusers: Compatible with 🤗 diffusers v0.35.2, preview supports for sota v0.36 pipelines
mindone.transformers: Compatible with 🤗 transformers v4.57.1
ComfyUI: Added initial ComfyUI integration support
MindSpore: Compatible with MindSpore 2.6.0 - 2.7.1

mindone.transformers updates

Major upgrade: Enhanced compatibility with 🤗 transformers v4.54 and v4.57.1.
70+ new models added: Check support list here.

Base Updates

Transformers 4.54 base support (#1387)
Transformers 4.57 base support (#1445)

New Models

Vision Models: AIMv2 (#1456), DINOv3 ViT/ConvNeXt (v4.57.1) (#1439), SAM-HQ (v4.57.1) (#1457), Bria (#1384), Florence2 (#1453), EfficientLoftr (#1456), HGNet_v2 (#1395), Ovis2 (#1454)
Audio/Speech Models: Granite Speech (#1406), Kyutai Speech-to-Text (#1407), Voxtral (#1456), Parakeet (#1451), XCodec (#1452), Dia (#1404), CSM (#1399)
Text/Language Models: Llama4 (#1470), Arcee (#1470), Falcon H1 (#1465), Dots1 (#1469), SmolLM3 (v4.54.1) (#1391), ModernBERT Decoder (v4.54.1) (#1397), Hunyuan V1 Dense/MoE (v4.57.1) (#1401), Evolla (v4.54.1) (#1440), EXAONE (#1396), Doge (#1392), ERNIE 4.5 & ERNIE 4.5 MoE (#1393), GLM4 MoE (#1409), Flex OLMo (#1442), T5Gemma (#1420), VaultGemma (#1450), BLT/Apertus/Ministral (#1462), EOMT/TimesFM (#1403), Seed OSS (#1441), xLSTM (#1466), d_fine, GraniteMoeHybrid, EfficientLoFTR Models (#1405)
Multimodal Models: Qwen3 Omni (#1411), Qwen3 Next (#1476), ColQwen2 (v4.54.1) (#1414), Cohere2 Vision (v4.57.1) (#1473), InternVL (v4.57) (#1463), Janus (v4.57) (#1463), Kosmos-2.5 (#1456), LFM2/LFM2-VL (#1456), MetaCLIP 2 (#1456), Mlcd (#1472), SAM2 (#1426), SAM2 Video Support (#1434), Olmo3 Model (#1467), DeepseekV2/DeepseekVL/DeepseekVLHybrid (#1477), MM Grounding DINO (#1486)
model updates: update Mistral3 to v4.57.1 (#1464), update Qwen2.5VL to v4.54.1 (#1421)

multimodal processors for vllm-mindspore community

Qwen2.5VL ImageProcessor Fast / VideoProcessor (#1429)
Qwen3_VL Video Processor & Qwen2_VL Image Processor Fast (#1419)
Phi4/Whisper/Ultravox/InternVL/Qwen2_audio/MiniCPMV/LLaVA-Next/LLaVA-Next-Video processors (#1471)

mindone.diffusers updates

New Features

🚀 Context parallelism: Ring & Ulysses & Unified Attention (#1438)
Added AutoencoderMixin (#1444)

New Pipelines

Kandinsky5 (#1388), Lucy (#1390), etc.
Enable multi-card Inference for flux2 Pipeline (zero-3 sharding) #1446

ComfyUI Integration

Added ComfyUI root files and CLI args (#1480)
Added text encoder files (#1481)
Updated clip_model.py (#1479)

Examples Updates

Added Wan2.2 LoRA finetune support (#1418)
Updated Emu3 performance for MindSpore 2.6.0 and 2.7.0 (#1417)
Updated HunyuanVideo-I2V to mindspore 2.6.0 and 2.7.0 (#1385)
🚀 Add accelerated dit pipelines compatible with mindspore Graph Mode (#1433)
🚀 Added Fb cache taylorseer graph mode implementation for Flux.1 (#1475)
Qwenimage LoRA fintune supports.#1394)

Fixed

Fixed AIMv2/Arcee rely on torch bug (#1485)
Fixed bugs of mindone.transformers models that rely on torch (#1482)
Fixed Qwen2.5VLProcessor tokenizer converting tensor bug (#1483)
Fixed Qwen3_VL text attention selection bug (#1455)
Fixed GLM4.1V bs>1 generation index bug (#1437)
Fixed training issue in TrainOneStepWrapper (#1408)
Fixed import error if env contains accelerate module (#1431)
ZeRO: Support training with MS 2.6.0 and 2.7.0 (#1383)
Misc bugfixes (#1424)
Fixed some diffusers bugs (#1448)
Docs updates for mindone v0.5.0 release, and ut fixes (#1484)

Statistics

Total commits: 82
Files changed: 798
Lines added: 157,122
Lines deleted: 22,303

🙏 Acknowledgments

Special thanks to our amazing contributors who helped shape MindOne v0.5.0!

Andy Zhou, Chaoran Wei, Cheung Ka Wai, Cui-yshoho, Didan Deng, Feiran Zhang, Fzilan, GUOGUO, Rustam Khadipash, The-truthh, YMC, Yingshu CHEN, alien-0119, jijiarong, liuchuting, vigo999, zackcxb, zyd-ustc

Together We Build, Together We Grow. Thanks to every open source maintainer, contributor, and user. ✨

Start your AI model development journey with MindOne v0.5.0 today! 🚀

📖 Full Changelog: CHANGELOG.md

Assets 2

02 Nov 12:05

vigo999

v0.4.0

3819dd5

v0.4.0

🎉 MindOne v0.4.0 - Major Release

We're excited to announce the official release of MindOne v0.4.0! This is a milestone release that brings extensive AI model support and significant performance improvements.

🚀 Key Highlights

mindone.diffusers: Compatible with 🤗 diffusers v0.35.0
mindone.transformers: Compatible with 🤗 transformers v4.50
MindSpore: Upgraded to require >=2.6.0

mindone.transformers updates

Major upgrade: Enhanced compatibility with 🤗 transformers v4.50
280+ models supported: Comprehensive model library including vision, audio, multimodal, and text models

new models

Vision Models: FLAVA (#1342), RT-DETR/RT-DETRv2 (#1317), SegGPT (#1318), Table Transformer (#1320), UperNet (#1319), Granite-Vision/MatCha/DePlot (#1334), ViT series/ZoeDepth (#1321), Grounding DINO (#1175), Idefics/Idefics3 (#1159, #1084), Aria (#1089), CLIPSeg (#1242), VideoLlava/VipLllava (#1238), Kosmos-2 (#1295), Pix2Struct (#1295)
Audio Models: Wav2Vec2-Conformer/BERT (#1312), Seamless-M4T (#1293), Bark (#1313), Speech-Encoder-Decoder (#1281), UniSpeech/UniSpeech-SAT (#1277), Data2Vec (#1273), WavLM (#1323), HuBERT (#1128), CLVP (#1259)
Text/Multilingual Models: Jamba (#1274), Udop (#1283), Cohere (#1304), GPT-NeoX/Japanese (#1114, #1112), GPT-J/BigCode (#1115, #1113), StableLM (#1070), OLMo/OLMo2 (#1095), ModernBERT/RWKV/Nystromformer/Zamba (#1241), Mamba/Mamba2 (#1162), Phi (#1073), MiniCPM4 (#1053), GLM-4.1V (#1109), Falcon-Mamba (#1176), X-MOD (#1176), Llama3 (#1084)
Multimodal Models: Emu3 (#1233), BLIP/GLM4V/MPT (#1103), InstructBLIP/Video (#1295), BridgeTower (#1253), Aya Vision (#1253), LiLT (#1272), MGP-STR (#1262), TrOCR/TVP (#1297), GOT-OCR-2 (#1245), Segment Anything (SAM) (#1223), ColPali (#1259)
Architecture Models: DiffLlama/OLMoE (#1147), LongT5/Longformer (#1234), NLLB-MoE (#1244), mBART (#1195), ELECTRA/Pegasus/X (#1295), SqueezeBERT (#1295), IBert (#1212), Bamba (#1241), FocalNet/RegNet (#1254), MobileNet v1/v2 (#1171), DistilBERT/Funnel/MLLaMA (#1256), Mistral3/Pixtral/ResNet (#1190), BERT Generation/DeiT (#1205), SigLIP2 (#1076)
Examples & Documentation: BERT Japanese/BERTweet/ByT5/DialogGPT/Falcon3/Flan-T5/PhoBERT/XLM-V (#1328), Depth Anything V2/DiT (#1332), Granite-Vision/MatCha/DePlot (#1334), GLM4V processor (#1349)

mindone.diffusers updates

Major upgrade: Enhanced compatibility with 🤗 diffusers v0.35.0
70+ pipelines supported: Comprehensive pipeline library for text-to-image, image-to-image, text-to-video, and audio generation
50+ model components: Transformers, autoencoders, controlnets, and processing modules as building blocks

new pipelines

Video Generation: QwenImage (#1288), HiDream (#1360), Wan-VACE (#1148), SkyReels-V2 (#1203), Chroma-Dev (#1157), Sana Sprint Img2Img/VisualCloze (#1145), HunyuanVideo (#1029), Wan (#1021), Lumina2 (#996), LTXCondition (#997), UniDiffuser (#979)
Image Generation: Amused & Ledits++ (#976), OmniGen & Marigold (#1062), Stable Diffusion Attend & Excite (#1013), SD Unclip/PIA (#958)
Audio Generation: AudioLDM2 (#981)
Advanced Sampling: K-diffusion pipelines (#986)
Testing & Documentation: UniDiffusers test (#1007), 'reuse a pipeline' docs (#989), diffusers mint changes (#992)

model components

Video Transformers: transformer_qwenimage (#1288), transformer_hidream_image, transformer_wan_vace (#1148), transformer_skyreels_v2 (#1203), transformer_chroma (#1157), transformer_cosmos (#1196), transformer_hunyuan_video_framepack (#1029), consisid_transformer_3d (#1124)
Autoencoders: autoencoder_kl_qwenimage (#1288), autoencoder_kl_cosmos (#1196)
ControlNets: controlnet_sana (#1145), multicontrolnet_union (#1158)
Processing Modules: cache_utils (#1299), auto_model (#1158), lora processing modules (#1158)

mindone.peft updates

Added mindone.peft and upgraded to v0.15.2 (#1194)
Added Qwen2.5-Omni LoRA finetuning script with transformers 4.53.0 (#1218)
Fixed lora and lora_scale from each PEFT layer (#1187)

models under examples (mostly with finetune/training scripts)

Added Janus model ...

Assets 2

11 Apr 01:55

CaitinZhao

v0.3.0

875acd8

MindONE v0.3.0 release

We are thrilled to announce the release of MindONE 0.3.0, featuring more state-of-the-art multi-modal understanding and generative models and better compatibility with transformers and diffusers. MindONE now supports the latest features in diffuers v0.32.2, including over 160 pipelines, 50 models, and 35 schedulers. It allows users to easily develop new image/video/audio generation models or transfer existing models from torch to mindspore. MindONE 0.3.0 is built on MindSpore2.5 and optimized for Ascend NPUs, ensuring high-performance training for various generative models, such as opensora, cogvideox, and JanusPro from DeepSeek.

Key Features

Support Diffusers v0.32.2

MindONE now supports the following new pipelines for image and video generation, along with new training scripts:

Video Generation Pipelines: CogVideoX, Latte, Mochi-1, Allegro, LTXVideo, HunyuanVideo, and more.
Image Generation Pipelines: Cogview3/4, Stable Diffusion 3.5, CogView3, Flux, SANA, Lumina, Kolors, AuraFlow, and more.
Training Scripts: CogvideoX SFT & LoRA, Flux SFT & LoRA & ControlNet, and SD3/3.5 SFT & LoRA.

For more details, visit the diffusers documentation.

Expanded Multi-Modal Generative Models

MindONE v0.3.0 adds various state-of-the-art generative models as examples, ensuring efficient training performance on Ascend NPUs, including:

task	model	inference	finetune	pretrain	institute
Image-to-Video	hunyuanvideo-i2v 🔥🔥	✅	✖️	✖️	Tencent
Text/Image-to-Video	wan2.1 🔥🔥🔥	✅	✖️	✖️	Alibaba
Text-to-Image	cogview4 🔥🔥🔥	✅	✖️	✖️	Zhipuai
Text-to-Video	step_video_t2v 🔥🔥	✅	✖️	✖️	StepFun
Image-Text-to-Text	qwen2_vl 🔥🔥🔥	✅	✖️	✖️	Alibaba
Any-to-Any	janus 🔥🔥🔥	✅	✅	✅	DeepSeek
Any-to-Any	emu3 🔥🔥	✅	✅	✅	BAAI
Class-to-Image	var🔥🔥	✅	✅	✅	ByteDance
Text/Image-to-Video	hpcai open 2.0🔥🔥	✅	✖️	✖️	HPC-AI Tech
Text/Image-to-Video	cogvideox 1.5 5B~30B 🔥🔥	✅	✅	✅	Zhipu
Text-to-Video	open sora plan 1.3🔥🔥	✅	✅	✅	PKU
Text-to-Video	hunyuanvideo🔥🔥	✅	✅	✅	Tencent
Text-to-Video	movie gen 30B🔥🔥	✅	✅	✅	Meta
Video-Encode-Decode	magvit	✅	✅	✅	Google
Text-to-Image	story_diffusion	✅	✖️	✖️	ByteDance
Image-to-Video	dynamicrafter	✅	✖️	✖️	Tencent
Video-to-Video	venhancer	✅	✖️	✖️	Shanghai AI Lab
Text-to-Video	t2v_turbo	✅	✅	✅	Google
Text/Image-to-Video	video composer	✅	✅	✅	Alibaba
Text-to-Image	flux 🔥	✅	✅	✖️	Black Forest Lab
Text-to-Image	stable diffusion 3 🔥	✅	✅	✖️	Stability AI
Text-to-Image	kohya_sd_scripts	✅	✅	✖️	kohya
Text-to-Image	t2i-adapter	✅	✅	✅	Shanghai AI Lab
Text-to-Image	ip adapter	✅	✅	✅	Tencent
Text-to-3D	mvdream	✅	✅	✅	ByteDance
Image-to-3D	instantmesh	✅	✅	✅	Tencent
Image-to-3D	sv3d	✅	✅	✅	Stability AI
Text/Image-to-3D	hunyuan3d-1.0	✅	✅	✅	Tencent

Support Texto-to-Video Data Curation

MindONE v0.3.0 adds a new pipeline for text-to-video filtering, which supports scene detection and video splitting, de-duplication, aesthetic/ocr/lpips/nsfw scoring, and video captioning.

For more details, visit t2v curation documentation

Assets 4

06 Nov 08:09

kingcong

v0.2.0

4f03eb7

MindONE 0.2.0

We are excited to announce the official release of MindONE, a state-of-the-art repository dedicated to multi-modal understanding and content generation. Built on MindSpore 2.3.1 and optimized for Ascend NPUs, MindONE provides a comprehensive suite of algorithms and models designed to facilitate advanced content generation across various modalities, including images, audio, videos, and even 3D objects.

Key Features

diffusers support on MindSpore

We've tried to provide a completely consistent interface and usage with the huggingface/diffusers.
Only necessary changes are made to the huggingface/diffusers to make it seamless for users from torch.

- from diffusers import DiffusionPipeline
+ from mindone.diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
-    torch_dtype=torch.float16,
+    mindspore_dtype=mindspore.float16
    use_safetensors=True
)

prompt = "An astronaut riding a green horse"

images = pipe(prompt=prompt)[0][0]

Important

Due to the huggingface/diffusers is still under active development,
many features are not yet well-supported.
Currently, most functions of huggingface/diffusers v0.29.x are supported.
For details, see MindOne Diffusers.

MindSpore patch for transformers

This MindSpore patch for huggingface/Transformers enables researchers or developers
in the field of text-to-image (t2i) and text-to-video (t2v) generation to utilize pretrained text and image models
from huggingface/Transformers on MindSpore.
Only the Ascend related modules are modified. Other modules reuse the huggingface/Transformers.

The following lines of code are an example that shows you how to download and use the pretrained models. Remember that the models are from mindone.transformers, and anything else is from huggingface/Transformers.

from mindspore import Tensor
# use tokenizer from huggingface/Transformers
from transformers import AutoTokenizer
# use model from mindone.transformers
-from transformers import CLIPTextModel
+from mindone.transformers import CLIPTextModel

model = CLIPTextModel.from_pretrained("openai/clip-vit-base-patch32")
tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-base-patch32")

inputs = tokenizer(
    ["a photo of a cat", "a photo of a dog"],
    padding=True,
-    return_tensors="pt",
+    return_tensors="np"
)
-outputs = model(**inputs)
+outputs = model(Tensor(inputs.input_ids))

For details, see MindOne Transformers.

State-of-the-Art generative models

MindONE showcases various state-of-the-art generative models as examples, ensuring efficient training performance on Ascend NPUs, including:

model	features
hpcai open sora	support v1.0/1.1/1.2 large scale training with dp/sp/zero
open sora plan	support v1.0/1.1/1.2 large scale training with dp/sp/zero
stable diffusion	support sd 1.5/2.0/2.1, vanilla fine tune, lora, dreambooth, text inversion
stable diffusion xl	support sai style(stability AI) vanilla fine tune, lora, dreambooth
dit	support text to image fine tune
hunyuan_dit	support text to image fine tune
pixart_sigma	suuport text to image fine tune at different aspect ratio
latte	support uncondition text to image fine tune
animate diff	support motion module and lora training
dynamicrafter	support image to video generation

Assets 4

Releases: mindspore-lab/mindone

🎄MindOne v0.5.0 - Major Release

🚀 Key Highlights

mindone.transformers updates

Base Updates

New Models

multimodal processors for vllm-mindspore community

mindone.diffusers updates

New Features

New Pipelines

ComfyUI Integration

Examples Updates

Fixed

Statistics

🙏 Acknowledgments

Uh oh!

v0.4.0

🎉 MindOne v0.4.0 - Major Release

🚀 Key Highlights

mindone.transformers updates

new models

mindone.diffusers updates

new pipelines

model components

mindone.peft updates

models under examples (mostly with finetune/training scripts)

Uh oh!

MindONE v0.3.0 release

Key Features

Uh oh!

MindONE 0.2.0

Key Features

Uh oh!