From 8ad2f2447129a19c490d4260106a3a51cd22b963 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Tue, 22 Oct 2024 16:12:50 +0000
Subject: [PATCH 01/11] start draft

---
 docs/source/en/_toctree.yml           |  2 ++
 docs/source/en/optimization/neuron.md | 14 ++++++++++++++
 2 files changed, 16 insertions(+)
 create mode 100644 docs/source/en/optimization/neuron.md
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
index 58218c0272bd..87ff9b1fb81a 100644
--- a/docs/source/en/_toctree.yml
+++ b/docs/source/en/_toctree.yml
@@ -188,6 +188,8 @@
       title: Metal Performance Shaders (MPS)
     - local: optimization/habana
       title: Habana Gaudi
+    - local: optimization/neuron
+      title: AWS Neuron
     title: Optimized hardware
   title: Accelerate inference and reduce memory
 - sections:
diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
new file mode 100644
index 000000000000..b903f52269a9
--- /dev/null
+++ b/docs/source/en/optimization/neuron.md
@@ -0,0 +1,14 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# AWS Neuron
+

From ca99541460b11df4a24c45e9382b22a6017db6e6 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Thu, 24 Oct 2024 15:19:31 +0000
Subject: [PATCH 02/11] add doc

---
 docs/source/en/optimization/neuron.md | 47 +++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index b903f52269a9..c9e58979e711 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -12,3 +12,50 @@ specific language governing permissions and limitations under the License.
 
 # AWS Neuron
 
+🤗 Diffusers functionalities are available on [AWS Inf2 instances](https://aws.amazon.com/ec2/instance-types/inf2/), which are EC2 instances powered by [Neuron machine learning accelerators](https://aws.amazon.com/machine-learning/inferentia/). These instances aim at providing better compute performance(higher throughput, lower latency) with good cost-efficiency, which makes them good candidates for AWS users to deploy diffusion models for production.
+
+A wide range of features in 🤗 Diffusers are supported by [🤗 Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/index) via similar APIs. Once you have created an AWS Inf2 instance, you can install 🤗 Optimum Neuron:
+
+```bash
+python -m pip install --upgrade-strategy eager optimum[neuronx]
+```
+
+<Tip>
+
+We provide pre-built [Hugging Face Neuron Deep Learning AMI](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) (DLAMI) and Optimum Neuron containers for Amazon SageMaker. It's recommended to correctly set up your environment 
+
+</Tip>
+
+Here below is an example of generating images with Stable Diffusion XL model on an inf2.8xlarge instance (you can switch to cheaper inf2.xlarge instances once the model is compiled). We will use the `NeuronStableDiffusionXLPipeline` class, a class similar to the `StableDiffusionXLPipeline` class in diffusers to generate some images for fun.
+
+Unlike in 🤗 Diffusers, we need to compile models in the pipeline to Neuron compatible format `.neuron`. To do this, you will need to launch the following command:
+
+```bash
+optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 \
+  --batch_size 1 \
+  --height 1024 `# height in pixels of generated image, eg. 768, 1024` \
+  --width 1024 `# width in pixels of generated image, eg. 768, 1024` \
+  --num_images_per_prompt 1 `# number of images to generate per prompt, defaults to 1` \
+  --auto_cast matmul `# cast only matrix multiplication operations` \
+  --auto_cast_type bf16 `# cast operations from FP32 to BF16` \
+  sd_neuron_xl/
+```
+
+Now let's generate some images with pre-compiled SDXL models:
+
+```python
+>>> from optimum.neuron import NeuronStableDiffusionXLPipeline
+
+>>> stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained("sd_neuron_xl/")
+>>> prompt = "a pig with wings flying in floating US dollar banknotes in the air, skyscrapers behind, warm color palette, muted colors, detailed, 8k"
+>>> image = stable_diffusion_xl(prompt).images[0]
+```
+
+<img
+  src="https://huggingface.co/datasets/Jingya/document_images/resolve/main/optimum/neuron/sdxl_pig.png"
+  width="256"
+  height="256"
+  alt="peggy generated by sdxl on inf2"
+/>
+
+Feel free to check out more guides and examples on different use cases from the [🤗 Optimum Neuron's documentation](https://huggingface.co/docs/optimum-neuron/en/inference_tutorials/stable_diffusion#generate-images-with-stable-diffusion-models-on-aws-inferentia)!

From 1ac29830e6033fd7f40c8fe3f1166c8f12e11f99 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Fri, 25 Oct 2024 13:00:33 +0200
Subject: [PATCH 03/11] Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/optimization/neuron.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index c9e58979e711..ec4fd00f9e67 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
 
 # AWS Neuron
 
-🤗 Diffusers functionalities are available on [AWS Inf2 instances](https://aws.amazon.com/ec2/instance-types/inf2/), which are EC2 instances powered by [Neuron machine learning accelerators](https://aws.amazon.com/machine-learning/inferentia/). These instances aim at providing better compute performance(higher throughput, lower latency) with good cost-efficiency, which makes them good candidates for AWS users to deploy diffusion models for production.
+Diffusers functionalities are available on [AWS Inf2 instances](https://aws.amazon.com/ec2/instance-types/inf2/), which are EC2 instances powered by [Neuron machine learning accelerators](https://aws.amazon.com/machine-learning/inferentia/). These instances aim to provide better compute performance (higher throughput, lower latency) with good cost-efficiency, making them good candidates for AWS users to deploy diffusion models to production.
 
 A wide range of features in 🤗 Diffusers are supported by [🤗 Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/index) via similar APIs. Once you have created an AWS Inf2 instance, you can install 🤗 Optimum Neuron:
 

From 7bf5c8dfae4d2b312677b9261d6fd65e09f4ad37 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Fri, 25 Oct 2024 13:00:54 +0200
Subject: [PATCH 04/11] Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/optimization/neuron.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index ec4fd00f9e67..b4a59c55bd8b 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -22,7 +22,7 @@ python -m pip install --upgrade-strategy eager optimum[neuronx]
 
 <Tip>
 
-We provide pre-built [Hugging Face Neuron Deep Learning AMI](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) (DLAMI) and Optimum Neuron containers for Amazon SageMaker. It's recommended to correctly set up your environment 
+We provide pre-built [Hugging Face Neuron Deep Learning AMI](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) (DLAMI) and Optimum Neuron containers for Amazon SageMaker. It's recommended to correctly set up your environment.
 
 </Tip>
 

From 76e422a7d831098732bbaa4eafa2e184ef755eb5 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Fri, 25 Oct 2024 13:01:16 +0200
Subject: [PATCH 05/11] Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/optimization/neuron.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index b4a59c55bd8b..e6eae3f2f6b5 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -26,7 +26,7 @@ We provide pre-built [Hugging Face Neuron Deep Learning AMI](https://aws.amazon.
 
 </Tip>
 
-Here below is an example of generating images with Stable Diffusion XL model on an inf2.8xlarge instance (you can switch to cheaper inf2.xlarge instances once the model is compiled). We will use the `NeuronStableDiffusionXLPipeline` class, a class similar to the `StableDiffusionXLPipeline` class in diffusers to generate some images for fun.
+The example below demonstrates how to generate images with the Stable Diffusion XL model on an inf2.8xlarge instance (you can switch to cheaper inf2.xlarge instances once the model is compiled). To generate some images, use the [`~optimum.neuron.NeuronStableDiffusionXLPipeline`] class, which is similar to the [`StableDiffusionXLPipeline`] class in Diffusers.
 
 Unlike in 🤗 Diffusers, we need to compile models in the pipeline to Neuron compatible format `.neuron`. To do this, you will need to launch the following command:
 

From eab62e7472e55039f05d9b868dcf39943f4cc280 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Fri, 25 Oct 2024 13:01:33 +0200
Subject: [PATCH 06/11] Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/optimization/neuron.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index e6eae3f2f6b5..89bd79eababd 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -28,7 +28,7 @@ We provide pre-built [Hugging Face Neuron Deep Learning AMI](https://aws.amazon.
 
 The example below demonstrates how to generate images with the Stable Diffusion XL model on an inf2.8xlarge instance (you can switch to cheaper inf2.xlarge instances once the model is compiled). To generate some images, use the [`~optimum.neuron.NeuronStableDiffusionXLPipeline`] class, which is similar to the [`StableDiffusionXLPipeline`] class in Diffusers.
 
-Unlike in 🤗 Diffusers, we need to compile models in the pipeline to Neuron compatible format `.neuron`. To do this, you will need to launch the following command:
+Unlike Diffusers, you need to compile models in the pipeline to the Neuron format, `.neuron`. Launch the following command to export the model to the `.neuron` format.
 
 ```bash
 optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 \

From b589e1dfe119330664f8c85a1675dfaecc35aa50 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Fri, 25 Oct 2024 13:01:45 +0200
Subject: [PATCH 07/11] Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/optimization/neuron.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index 89bd79eababd..4896efe3fd0d 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -41,7 +41,7 @@ optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 \
   sd_neuron_xl/
 ```
 
-Now let's generate some images with pre-compiled SDXL models:
+Now generate some images with the pre-compiled SDXL model.
 
 ```python
 >>> from optimum.neuron import NeuronStableDiffusionXLPipeline

From b507be6927dedd4e19f89367131e567f321ecc87 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Fri, 25 Oct 2024 13:01:56 +0200
Subject: [PATCH 08/11] Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/optimization/neuron.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index 4896efe3fd0d..3e0bac864973 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -58,4 +58,4 @@ Now generate some images with the pre-compiled SDXL model.
   alt="peggy generated by sdxl on inf2"
 />
 
-Feel free to check out more guides and examples on different use cases from the [🤗 Optimum Neuron's documentation](https://huggingface.co/docs/optimum-neuron/en/inference_tutorials/stable_diffusion#generate-images-with-stable-diffusion-models-on-aws-inferentia)!
+Feel free to check out more guides and examples on different use cases from the Optimum Neuron [documentation](https://huggingface.co/docs/optimum-neuron/en/inference_tutorials/stable_diffusion#generate-images-with-stable-diffusion-models-on-aws-inferentia)!

From e3db656f064040ced6810e9cd8352d8e36d36c1d Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Fri, 25 Oct 2024 13:02:18 +0200
Subject: [PATCH 09/11] Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/optimization/neuron.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index 3e0bac864973..cbc1e787b7e6 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
 
 Diffusers functionalities are available on [AWS Inf2 instances](https://aws.amazon.com/ec2/instance-types/inf2/), which are EC2 instances powered by [Neuron machine learning accelerators](https://aws.amazon.com/machine-learning/inferentia/). These instances aim to provide better compute performance (higher throughput, lower latency) with good cost-efficiency, making them good candidates for AWS users to deploy diffusion models to production.
 
-A wide range of features in 🤗 Diffusers are supported by [🤗 Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/index) via similar APIs. Once you have created an AWS Inf2 instance, you can install 🤗 Optimum Neuron:
+[Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/index) supports many of the features in Diffusers with similar APIs, so it is easier to learn if you're already familiar with Diffusers. Once you have created an AWS Inf2 instance, install Optimum Neuron.
 
 ```bash
 python -m pip install --upgrade-strategy eager optimum[neuronx]

From cb0bdd84d8631472caaa3a8dddd49bab23bb94f2 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Fri, 25 Oct 2024 11:48:00 +0000
Subject: [PATCH 10/11] bref intro of ON

---
 docs/source/en/optimization/neuron.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index cbc1e787b7e6..576318da0d44 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
 
 Diffusers functionalities are available on [AWS Inf2 instances](https://aws.amazon.com/ec2/instance-types/inf2/), which are EC2 instances powered by [Neuron machine learning accelerators](https://aws.amazon.com/machine-learning/inferentia/). These instances aim to provide better compute performance (higher throughput, lower latency) with good cost-efficiency, making them good candidates for AWS users to deploy diffusion models to production.
 
-[Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/index) supports many of the features in Diffusers with similar APIs, so it is easier to learn if you're already familiar with Diffusers. Once you have created an AWS Inf2 instance, install Optimum Neuron.
+[Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/index) is the interface between 🤗 HugiingFace libraries and AWS Accelerators including AWS [Trainium](https://aws.amazon.com/machine-learning/trainium/) and AWS [Inferentia](https://aws.amazon.com/machine-learning/inferentia/). It supports many of the features in Diffusers with similar APIs, so it is easier to learn if you're already familiar with Diffusers. Once you have created an AWS Inf2 instance, install Optimum Neuron.
 
 ```bash
 python -m pip install --upgrade-strategy eager optimum[neuronx]

From 5a6153b8fee5fe864c6e9dc4250bce1a2eb57991 Mon Sep 17 00:00:00 2001
From: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
Date: Fri, 25 Oct 2024 17:04:34 +0200
Subject: [PATCH 11/11] Update docs/source/en/optimization/neuron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
---
 docs/source/en/optimization/neuron.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/optimization/neuron.md b/docs/source/en/optimization/neuron.md
index 576318da0d44..b10050e64d7f 100644
--- a/docs/source/en/optimization/neuron.md
+++ b/docs/source/en/optimization/neuron.md
@@ -14,7 +14,7 @@ specific language governing permissions and limitations under the License.
 
 Diffusers functionalities are available on [AWS Inf2 instances](https://aws.amazon.com/ec2/instance-types/inf2/), which are EC2 instances powered by [Neuron machine learning accelerators](https://aws.amazon.com/machine-learning/inferentia/). These instances aim to provide better compute performance (higher throughput, lower latency) with good cost-efficiency, making them good candidates for AWS users to deploy diffusion models to production.
 
-[Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/index) is the interface between 🤗 HugiingFace libraries and AWS Accelerators including AWS [Trainium](https://aws.amazon.com/machine-learning/trainium/) and AWS [Inferentia](https://aws.amazon.com/machine-learning/inferentia/). It supports many of the features in Diffusers with similar APIs, so it is easier to learn if you're already familiar with Diffusers. Once you have created an AWS Inf2 instance, install Optimum Neuron.
+[Optimum Neuron](https://huggingface.co/docs/optimum-neuron/en/index) is the interface between Hugging Face libraries and AWS Accelerators, including AWS [Trainium](https://aws.amazon.com/machine-learning/trainium/) and AWS [Inferentia](https://aws.amazon.com/machine-learning/inferentia/). It supports many of the features in Diffusers with similar APIs, so it is easier to learn if you're already familiar with Diffusers. Once you have created an AWS Inf2 instance, install Optimum Neuron.
 
 ```bash
 python -m pip install --upgrade-strategy eager optimum[neuronx]