diff --git a/prototype_source/inductor_windows_cpu.rst b/prototype_source/inductor_windows_cpu.rst new file mode 100644 index 00000000000..eb10be5a3d1 --- /dev/null +++ b/prototype_source/inductor_windows_cpu.rst @@ -0,0 +1,128 @@ +How to use TorchInductor on Windows CPU +======================================= + +**Author**: `Zhaoqiong Zheng `_, `Xu, Han `_ + + + +TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels. +This tutorial will guide you through the process of using TorchInductor on a Windows CPU. + +.. grid:: 2 + + .. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn + :class-card: card-prerequisites + + * How to compile and execute a Python function with PyTorch, optimized for Windows CPU + * Basics of TorchInductor's optimization using C++/Triton kernels. + + .. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites + :class-card: card-prerequisites + + * PyTorch v2.5 or later + * Microsoft Visual C++ (MSVC) + * Miniforge for Windows + +Install the Required Software +----------------------------- + +First, let's install the required software. C++ compiler is required for TorchInductor optimization. +We will use Microsoft Visual C++ (MSVC) for this example. + +1. Download and install `MSVC `_. + +2. During the installation, choose **Desktop Development with C++** in the **Desktop & Mobile** section in **Workloads** table. Then install the software + +.. note:: + + We recommend C++ compiler `Clang `_ and `Intel Compiler `_. + Please check `Alternative Compiler for better performance <#alternative-compiler-for-better-performance>`_. + +3. Download and install `Miniforge3-Windows-x86_64.exe `__. + +Set Up the Environment +---------------------- + +#. Open the command line environment via ``cmd.exe``. +#. Activate ``MSVC`` with the following command: + + .. code-block:: sh + + "C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat" +#. Activate ``conda`` with the following command: + + .. code-block:: sh + + "C:/ProgramData/miniforge3/Scripts/activate.bat" +#. Create and activate a customer conda environment: + + .. code-block:: sh + + conda create -n inductor_cpu_windows python=3.10 -y + conda activate inductor_cpu_windows + +#. Install `PyTorch 2.5 `_ or later. + +Using TorchInductor on Windows CPU +---------------------------------- + +Here’s a simple example to demonstrate how to use TorchInductor: + +.. code-block:: python + + + import torch + def foo(x, y): + a = torch.sin(x) + b = torch.cos(x) + return a + b + opt_foo1 = torch.compile(foo) + print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10))) + +The code above returns the following output: + +.. code-block:: sh + + tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01, + 1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00], + [ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01, + 5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01], + [-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01, + 6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00], + [-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00, + 8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01], + [ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01, + 8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01], + [ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00, + 9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00], + [-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01, + 1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00], + [-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00, + 9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00], + [ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01, + -1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00], + [ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01, + 1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]]) + +Using an Alternative Compiler for Better Performance +------------------------------------------- + +To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC. + +Intel Compiler +^^^^^^^^^^^^^^ + +#. Download and install `Intel Compiler `_ with Windows version. +#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=icx-cl``. + +LLVM Compiler +^^^^^^^^^^^^^ + +#. Download and install `LLVM Compiler `_ and choose win64 version. +#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=clang-cl``. + +Conclusion +---------- + +In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch. In addition, we discussed +further performance improvements with Intel Compiler and LLVM Compiler. diff --git a/prototype_source/prototype_index.rst b/prototype_source/prototype_index.rst index 1eaedb6a1d9..c86ae857333 100644 --- a/prototype_source/prototype_index.rst +++ b/prototype_source/prototype_index.rst @@ -217,6 +217,13 @@ Prototype features are not available as part of binary distributions like PyPI o :link: ../prototype/inductor_cpp_wrapper_tutorial.html :tags: Model-Optimization +.. customcarditem:: + :header: Inductor Windows CPU Tutorial + :card_description: Speed up your models with Inductor On Windows CPU + :image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png + :link: ../prototype/inductor_windows_cpu.html + :tags: Model-Optimization + .. Distributed .. customcarditem:: :header: Flight Recorder Tutorial @@ -249,6 +256,7 @@ Prototype features are not available as part of binary distributions like PyPI o prototype/flight_recorder_tutorial.html prototype/graph_mode_dynamic_bert_tutorial.html prototype/inductor_cpp_wrapper_tutorial.html + prototype/inductor_windows_cpu.html prototype/pt2e_quantizer.html prototype/pt2e_quant_ptq.html prototype/pt2e_quant_qat.html