Fzilan
diff --git a/‎docs/diffusers/optimization/fp16.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/diffusers/optimization/fp16.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/diffusers/stable_diffusion.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/diffusers/stable_diffusion.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/diffusers/using-diffusers/marigold_usage.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/diffusers/using-diffusers/marigold_usage.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/animatediff/README.md‎
Lines changed: 2 additions & 2 deletions b/‎examples/animatediff/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/animatediff/ad/modules/attention.py‎
Lines changed: 1 addition & 1 deletion b/‎examples/animatediff/ad/modules/attention.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/animatediff/args_train.py‎
Lines changed: 6 additions & 1 deletion b/‎examples/animatediff/args_train.py‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎examples/animatediff/train.py‎
Lines changed: 1 addition & 1 deletion b/‎examples/animatediff/train.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/autoencoders/README.md‎
Lines changed: 1 addition & 1 deletion b/‎examples/autoencoders/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/diffusers/cogvideox_factory/README.md‎
Lines changed: 3 additions & 3 deletions b/‎examples/diffusers/cogvideox_factory/README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/diffusers/controlnet/README_flux.md‎
Lines changed: 1 addition & 1 deletion b/‎examples/diffusers/controlnet/README_flux.md‎
Lines changed: 1 addition & 1 deletion
@@ -18,7 +18,7 @@ There are several ways to optimize Diffusers for inference speed, such as reduci
 
     Optimizing for inference speed or reduced memory usage can lead to improved performance in the other category, so you should try to optimize for both whenever you can. This guide focuses on inference speed, but you can learn more about lowering memory usage in the [Reduce memory usage](memory.md) guide.
 
-The inference times below are obtained from generating a single 512x512 image from the prompt "a photo of an astronaut riding a horse on mars" with 50 DDIM steps on a Ascend 910B in Graph mode.
+The inference times below are obtained from generating a single 512x512 image from the prompt "a photo of an astronaut riding a horse on mars" with 50 DDIM steps on a Ascend Atlas 800T A2 machine in Graph mode.
 
 | setup    | latency | speed-up |
 |----------|---------|----------|
@@ -48,7 +48,7 @@ You could also use a distilled Stable Diffusion model and autoencoder to speed u
 
     Read the [Open-sourcing Knowledge Distillation Code and Weights of SD-Small and SD-Tiny](https://huggingface.co/blog/sd_distillation) blog post to learn more about how knowledge distillation training works to produce a faster, smaller, and cheaper generative model.
 
-The inference times below are obtained from generating 4 images from the prompt "a photo of an astronaut riding a horse on mars" with 25 PNDM steps on a Ascend 910B. Each generation is repeated 3 times with the distilled Stable Diffusion v1.4 model by [Nota AI](https://hf.co/nota-ai).
+The inference times below are obtained from generating 4 images from the prompt "a photo of an astronaut riding a horse on mars" with 25 PNDM steps on a Ascend Atlas 800T A2 machine. Each generation is repeated 3 times with the distilled Stable Diffusion v1.4 model by [Nota AI](https://hf.co/nota-ai).
 
 | setup                        | latency | speed-up |
 |------------------------------|---------|----------|
 
@@ -61,7 +61,7 @@ image
     <img src="https://github.com/user-attachments/assets/67b06273-9081-4b4f-a31f-585b23f70f27">
 </div>
 
-This process took ~5.6 seconds on a Ascend 910B in Graph mode. By default, the [`DiffusionPipeline`](https://mindspore-lab.github.io/mindone/latest/diffusers/api/pipelines/overview/#mindone.diffusers.DiffusionPipeline) runs inference with full `float32` precision for 50 inference steps. You can speed this up by switching to a lower precision like `float16` or running fewer inference steps.
+This process took ~5.6 seconds on a Ascend Atlas 800T A2 machine in Graph mode. By default, the [`DiffusionPipeline`](https://mindspore-lab.github.io/mindone/latest/diffusers/api/pipelines/overview/#mindone.diffusers.DiffusionPipeline) runs inference with full `float32` precision for 50 inference steps. You can speed this up by switching to a lower precision like `float16` or running fewer inference steps.
 
 Let's start by loading the model in `float16` and generate an image:
 
@@ -163,7 +163,7 @@ make_image_grid(images, rows=2, cols=4)
     <img src="https://github.com/user-attachments/assets/5028a23d-7acd-4bb0-8633-38f8371eb393">
 </div>
 
-Whereas before you couldn't even generate a batch of 4 images, now you can generate a batch of 8 images at ~1.6 seconds per image! This is probably the fastest you can go on a Ascend 910B without sacrificing quality.
+Whereas before you couldn't even generate a batch of 4 images, now you can generate a batch of 8 images at ~1.6 seconds per image! This is probably the fastest you can go on a Ascend Atlas 800T A2 machine without sacrificing quality.
 
 ## Quality
 
 
@@ -131,7 +131,7 @@ Points on the shoulders pointing up with a large `Y` promote green color.
 ### Speeding up inference
 
 The above quick start snippets are already optimized for speed: they load the LCM checkpoint, use the `fp16` variant of weights and computation, and perform just one denoising diffusion step.
-The `pipe(image)` call completes in 180ms on Ascend 910B in Graph mode.
+The `pipe(image)` call completes in 180ms on Ascend Atlas 800T A2 machines in Graph mode.
 Internally, the input image is encoded with the Stable Diffusion VAE encoder, then the U-Net performs one denoising step, and finally, the prediction latent is decoded with the VAE decoder into pixel space.
 In this case, two out of three module calls are dedicated to converting between pixel and latent space of LDM.
 Because Marigold's latent space is compatible with the base Stable Diffusion, it is possible to speed up the pipeline call by more than 3x (85ms on RTX 3090) by using a [lightweight replacement of the SD VAE](../api/models/autoencoder_tiny.md):
 
@@ -4,7 +4,7 @@ This repository is the MindSpore implementation of [AnimateDiff](https://arxiv.o
 
 ## Features
 
-- [x] Text-to-video generation with AnimdateDiff v2, supporting 16 frames @512x512 resolution on Ascend 910*
+- [x] Text-to-video generation with AnimdateDiff v2, supporting 16 frames @512x512 resolution on Ascend Atlas 800T A2 machines
 - [x] MotionLoRA inference
 - [x] Motion Module Training
 - [X] Motion LoRA Training
@@ -253,7 +253,7 @@ Here are some generation results after lora fine-tuning on 512x512 resolution an
 
 ## Performance (AnimateDiff v2)
 
-Experiments are tested on ascend 910* graph mode.
+Experiments are tested on Ascend Atlas 800T A2 machines with graph mode.
 
 ### Inference
 
 
@@ -121,7 +121,7 @@ def __init__(self, dim, heads=4, dim_head=32):
 
 class CrossAttention(nn.Cell):
     """
-    Flash attention doesnot work well (leading to noisy images) for SD1.5-based models on 910B up to MS2.2.1-20231122 version,
+    Flash attention doesnot work well (leading to noisy images) for SD1.5-based models on Ascend Atlas 800T A2 machines up to MS2.2.1-20231122 version,
     due to the attention head dimension is 40, num heads=5. Require test on future versions
     """
 
 
@@ -40,7 +40,12 @@ def parse_args():
     )
     # ms
     parser.add_argument("--device_target", type=str, default="Ascend", help="Ascend or GPU")
-    parser.add_argument("--max_device_memory", type=str, default=None, help="e.g. `30GB` for 910a, `59GB` for 910b")
+    parser.add_argument(
+        "--max_device_memory",
+        type=str,
+        default=None,
+        help="e.g. `30GB` for Ascend 910, `59GB` for Ascend Atlas 800T A2 machines",
+    )
     parser.add_argument("--mode", default=0, type=int, help="Specify the mode: 0 for graph mode, 1 for pynative mode")
     parser.add_argument("--use_parallel", default=False, type=str2bool, help="use parallel")
     parser.add_argument(
 
@@ -459,7 +459,7 @@ def main(args):
             use_lora=args.motion_lora_finetune,
             lora_rank=args.motion_lora_rank,
             param_save_filter=[".temporal_transformer."] if args.save_mm_only else None,
-            record_lr=False,  # TODO: check LR retrival for new MS on 910b
+            record_lr=False,  # TODO: check LR retrival for new MS on Ascend Atlas 800T A2 machines
         )
         callback.append(save_cb)
         if args.profile:
 
@@ -66,7 +66,7 @@ For detailed arguments, please run `python infer.py -h`.
 
 ### Performance
 
-We split the CelebA-HQ dataset into 24,000 images for training and 6,000 images for testing. Experiments are tested on ascend 910* with graph mode.
+We split the CelebA-HQ dataset into 24,000 images for training and 6,000 images for testing. Experiments are tested on Ascend Atlas 800T A2 machines with graph mode.
 
 - mindspore 2.5.0
 
 
@@ -2,7 +2,7 @@
 
 在 Ascend 硬件下对 Cog 系列视频模型进行微调以实现自定义视频生成 ⚡️📼
 
-> 我们的开发和验证基于Ascend 910*硬件，相关环境如下：
+> 我们的开发和验证基于Ascend Atlas 800T A2硬件，相关环境如下：
 > | mindspore  | ascend driver  |  firmware   | cann toolkit/kernel |
 > |:----------:|:--------------:|:-----------:|:------------------:|
 > |    2.5     |    24.1.RC2    | 7.5.0.1.129 |      8.0.0.beta1       |
@@ -379,15 +379,15 @@ NODE_RANK="0"
 | CogvideoX 1.5 T2V 20B |   8   | 2  | 4  | zero3 |    ON     | 1x77x768x1360 |   bf16    |    O1     |  20.1  |   35.7 GB    |
 | CogvideoX 1.5 T2V 30B |   8   | 2  | 4  | zero3 |    ON     | 1x77x768x1360 |   bf16    |    O1     |  26.5  |   47.3 GB    |
 
-以上数据在Disney数据集，910*上获得。
+以上数据在Disney数据集，Ascend Atlas 800T A2训练服务器上获得。
 
 ### 推理
 
 |       model       | cards | DP | SP | zero  |  video shape  | precision | jit level | s/step | total cost |
 |:-----------------:|:-----:|:--:|:--:|:-----:|:-------------:|:---------:|:---------:|:------:|:----------:|
 | CogvideoX 1.5 T2V 5B  |   8   | 1  | 8  | zero3 | 1x77x768x1360 |   bf16    |    O1     |  3.21  |   ~ 5min   |
 
-以上数据在910*上获得。
+以上数据在Ascend Atlas 800T A2训练服务器上获得。
 
 ## 与原仓的差异&功能限制
 
 
@@ -44,7 +44,7 @@ We also support importing data from jsonl(xxx.jsonl),using `--jsonl_for_train` t
 
 ## Training
 
-Our experiments were conducted on a single 64GB 910* NPU.
+Our experiments were conducted on a single 64GB Ascend Atlas 800T A2 NPU.
 
 We can define the num_layers, num_single_layers, which determines the size of the control.
Original file line number	Diff line number	Diff line change
`@@ -459,7 +459,7 @@ def main(args):`
`459`	`459`	`use_lora=args.motion_lora_finetune,`
`460`	`460`	`lora_rank=args.motion_lora_rank,`
`461`	`461`	`param_save_filter=[".temporal_transformer."] if args.save_mm_only else None,`
`462`		`- record_lr=False, # TODO: check LR retrival for new MS on 910b`
	`462`	`+ record_lr=False, # TODO: check LR retrival for new MS on Ascend Atlas 800T A2 machines`
`463`	`463`	`)`
`464`	`464`	`callback.append(save_cb)`
`465`	`465`	`if args.profile:`