Skip to content

Latest commit

 

History

History
94 lines (75 loc) · 7.65 KB

File metadata and controls

94 lines (75 loc) · 7.65 KB

MindSpore ONE

This repository contains SoTA algorithms, models, and interesting projects in the area of multimodal understanding and content generation.

ONE is short for "ONE for all"

News

  • [2025.12.24] We release v0.5.0, compatibility with 🤗 Transformers v4.57.1 (70+ new models) and 🤗 Diffusers v0.35.2, plus previews of v0.36 pipelines like Flux2, QwenImageEditPlus, Lucy and Kandinsky5. Also introduces initial ComfyUI integration. Happy exploring!
  • [2025.11.02] v0.4.0 is released, with 280+ transformers models and 70+ diffusers pipelines supported. See here
  • [2025.04.10] We release v0.3.0. More than 15 SoTA generative models are added, including Flux, CogView4, OpenSora2.0, Movie Gen 30B, CogVideoX 5B~30B. Have fun!
  • [2025.02.21] We support DeepSeek Janus-Pro, a SoTA multimodal understanding and generation model. See here
  • [2024.11.06] v0.2.0 is released

Quick tour

To install v0.5.0, please install MindSpore 2.6.0 - 2.7.1 and run pip install mindone

Alternatively, to install the latest version from the master branch, please run:

git clone https://github.com/mindspore-lab/mindone.git
cd mindone
pip install -e .

We support state-of-the-art diffusion models for generating images, audio, and video. Let's get started using Stable Diffusion 3 as an example.

Hello MindSpore from Stable Diffusion 3!

sd3
import mindspore
from mindone.diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    mindspore_dtype=mindspore.float16,
)
prompt = "A cat holding a sign that says 'Hello MindSpore'"
image = pipe(prompt)[0][0]
image.save("sd3.png")

run hf diffusers on mindspore

  • mindone diffusers is under active development, most tasks were tested with MindSpore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
  • compatible with 🤗 diffusers v0.35.2, preview supports for SoTA v0.36 pipelines, see support list
  • 18+ training examples - controlnet, dreambooth, lora and more

run hf transformers on mindspore

  • mindone transformers is under active development, most tasks were tested with mindspore 2.6.0-2.7.1 on Ascend Atlas 800T A2 machines
  • compatibale with 🤗 transformers v4.57.1
  • providing 350+ state-of-the-art machine learning models in text, computer vision, audio, video, and multimodal model for inference, see support list

supported models under mindone/examples

task model inference finetune pretrain institute
Text/Image-to-Video wan2.1 🔥 ✖️ ✖️ Alibaba
Text/Image-to-Video wan2.2 🔥🔥 ✖️ Alibaba
Audio/Image-Text-to-Text qwen2_5_omni 🔥🔥 ✖️ Alibaba
Image/Video-Text-to-Text qwen2_5_vl 🔥🔥 ✖️ Alibaba
Any-to-Any qwen3_omni_moe 🔥🔥🔥 ✖️ ✖️ Alibaba
Image-Text-to-Text qwen3_vl/qwen3_vl_moe 🔥🔥🔥 ✖️ ✖️ Alibaba
Text-to-Image qwen_image 🔥🔥🔥 ✖️ Alibaba
Text-to-Text minicpm 🔥🔥 ✖️ ✖️ OpenBMB
Any-to-Any janus DeepSeek
Any-to-Any emu3 BAAI
Class-to-Image var ByteDance
Text-to-Image omnigen2 🔥 ✖️ VectorSpaceLab
Text/Image-to-Video hpcai open sora 1.2/2.0 HPC-AI Tech
Text/Image-to-Video cogvideox 1.5 5B~30B Zhipu
Image/Text-to-Text glm4v 🔥 ✖️ ✖️ Zhipu
Text-to-Video open sora plan 1.3 PKU
Text-to-Video hunyuanvideo Tencent
Image-to-Video hunyuanvideo-i2v 🔥 ✖️ ✖️ Tencent
Text-to-Video movie gen 30B Meta
Segmentation lang_sam 🔥 ✖️ ✖️ Meta
Segmentation sam2 ✖️ ✖️ Meta
Text-to-Video step_video_t2v ✖️ ✖️ StepFun
Text-to-Speech sparktts ✖️ ✖️ Spark Audio
Text-to-Image flux ✖️ Black Forest Lab
Text-to-Image stable diffusion 3 ✖️ Stability AI

supported captioner

task model inference finetune pretrain features
Image-Text-to-Text pllava ✖️ ✖️ support video and image captioning

training-free acceleration

Introduce dit infer acceleration - DiTCache, PromptGate and FBCache with Taylorseer, tested on sd3 and flux.1.