IntelPython
diff --git a/‎pulls/2098/_modules/dpctl.html‎
Lines changed: 0 additions & 4 deletions b/‎pulls/2098/_modules/dpctl.html‎
Lines changed: 0 additions & 4 deletions
diff --git a/‎pulls/2098/_modules/dpctl/tensor/_copy_utils.html‎
Lines changed: 102 additions & 63 deletions b/‎pulls/2098/_modules/dpctl/tensor/_copy_utils.html‎
Lines changed: 102 additions & 63 deletions
diff --git a/‎pulls/2098/_sources/beginners_guides/installation.rst.txt‎
Lines changed: 26 additions & 9 deletions b/‎pulls/2098/_sources/beginners_guides/installation.rst.txt‎
Lines changed: 26 additions & 9 deletions
diff --git a/‎pulls/2098/_sources/user_guides/environment_variables.rst.txt‎
Lines changed: 73 additions & 0 deletions b/‎pulls/2098/_sources/user_guides/environment_variables.rst.txt‎
Lines changed: 73 additions & 0 deletions
diff --git a/‎pulls/2098/api_reference/dpctl/generated/dpctl.get_composite_devices.html‎
Lines changed: 3 additions & 1 deletion b/‎pulls/2098/api_reference/dpctl/generated/dpctl.get_composite_devices.html‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎pulls/2098/api_reference/dpctl/generated/generated/dpctl.tensor.usm_ndarray.__dlpack_device__.html‎
Lines changed: 2 additions & 2 deletions b/‎pulls/2098/api_reference/dpctl/generated/generated/dpctl.tensor.usm_ndarray.__dlpack_device__.html‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎pulls/2098/beginners_guides/installation.html‎
Lines changed: 20 additions & 10 deletions b/‎pulls/2098/beginners_guides/installation.html‎
Lines changed: 20 additions & 10 deletions
diff --git a/‎pulls/2098/objects.inv‎
74 Bytes b/‎pulls/2098/objects.inv‎
74 Bytes
diff --git a/‎pulls/2098/searchindex.js‎
Lines changed: 1 addition & 1 deletion b/‎pulls/2098/searchindex.js‎
Lines changed: 1 addition & 1 deletion
@@ -923,10 +923,6 @@ <h1>Source code for dpctl</h1><div class="highlight"><pre>
     <span class="s2">&quot;utils&quot;</span><span class="p">,</span>
 <span class="p">]</span>
 
-<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">os</span><span class="p">,</span> <span class="s2">&quot;add_dll_directory&quot;</span><span class="p">):</span>
-    <span class="c1"># Include folder containing DPCTLSyclInterface.dll to search path</span>
-    <span class="n">os</span><span class="o">.</span><span class="n">add_dll_directory</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="vm">__file__</span><span class="p">))</span>
-
 
 <div class="viewcode-block" id="get_include"><a class="viewcode-back" href="../api_reference/dpctl/generated/dpctl.get_include.html#dpctl.get_include">[docs]</a><span class="k">def</span><span class="w"> </span><span class="nf">get_include</span><span class="p">():</span>
 <span class="w">    </span><span class="sa">r</span><span class="sd">&quot;&quot;&quot;</span>
 
@@ -166,14 +166,27 @@ A full list of available SYCL alias targets is available in the
 CUDA build
 ~~~~~~~~~~
 
-``dpctl`` can be built for CUDA devices using the ``DPCTL_TARGET_CUDA`` CMake option,
-which accepts a specific compute architecture string:
+``dpctl`` can be built for CUDA devices using the  ``--target-cuda`` argument.
+
+To target a specific architecture (e.g., ``sm_80``):
+
+.. code-block:: bash
+
+    python scripts/build_locally.py --verbose --target-cuda=sm_80
+
+To use the default architecture (``sm_50``), omit the value:
+
+.. code-block:: bash
+
+    python scripts/build_locally.py --verbose --target-cuda
+
+Alternatively, you can use the ``DPCTL_TARGET_CUDA`` CMake option:
 
 .. code-block:: bash
 
     python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_CUDA=sm_80"
 
-To use the default architecture (``sm_50``),
+To use the default architecture (``sm_50``) with CMake options,
 set ``DPCTL_TARGET_CUDA`` to a value such as ``ON``, ``TRUE``, ``YES``, ``Y``, or ``1``:
 
 .. code-block:: bash
@@ -192,12 +205,11 @@ Compute Capabilities can be found in the official
 AMD build
 ~~~~~~~~~
 
-``dpctl`` can be built for AMD devices using the ``DPCTL_TARGET_HIP`` CMake option,
-which requires specifying a compute architecture string:
+``dpctl`` can be built for AMD devices using the  ``--target-hip`` argument.
 
 .. code-block:: bash
 
-    python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=<arch>"
+    python scripts/build_locally.py --verbose --target-hip=<arch>
 
 Note that the `oneAPI for AMD GPUs` plugin requires the architecture be specified and only
 one architecture can be specified at a time.
@@ -208,11 +220,17 @@ To determine the architecture code (``<arch>``) for your AMD GPU, run:
     rocminfo | grep 'Name: *gfx.*'
 
 This will print names like ``gfx90a``, ``gfx1030``, etc.
-You can then use one of them as the argument to ``-DDPCTL_TARGET_HIP``.
+You can then use one of them as the argument to ``--target-hip``.
 
 For example:
 
 .. code-block:: bash
+    python scripts/build_locally.py --verbose --target-hip=gfx1030
+
+Alternatively, you can use the ``DPCTL_TARGET_HIP`` CMake option:
+
+.. code-block:: bash
+
     python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_HIP=gfx1030"
 
 Multi-target build
@@ -225,8 +243,7 @@ devices at the same time:
 
 .. code-block:: bash
 
-    python scripts/build_locally.py --verbose --cmake-opts="-DDPCTL_TARGET_CUDA=ON \
-    -DDPCTL_TARGET_HIP=gfx1030"
+    python scripts/build_locally.py --verbose --target-cuda --target-hip=gfx1030
 
 Running Examples and Tests
 ==========================
 
@@ -6,6 +6,12 @@ Environment variables
 
 Behavior of :py:mod:`dpctl` is affected by :dpcpp_envar:`environment variables <>` that
 affect DPC++ compiler runtime.
+Other relevant environment variables that may not be documented here can be found in:
+
+- `Level Zero <https://intel.github.io/llvm/EnvironmentVariables.html>`_
+
+- `OneAPI <https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#environment-variables>`_
+
 
 Variable ``ONEAPI_DEVICE_SELECTOR``
 -----------------------------------
@@ -50,3 +56,70 @@ The value of the variable is a bit-mask, with the following supported values:
       - Enables tracing of PI calls
     * - ``-1``
       - Enables all levels of tracing
+
+.. _env_var_ze_flat_device_hierarchy:
+
+Variable ``ZE_FLAT_DEVICE_HIERARCHY``
+--------------------------
+Allows users to define the device hierarchy model exposed by Level Zero driver implementation.
+Keep in mind :py:mod:`dpctl.get_composite_devices` will only work while this is set to ``COMBINED``.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Value
+      - Description
+    * - ``COMBINED``
+      - Level Zero devices with multiple tiles will be exposed as a set of root devices, each corresponding to an individual tile. These root devices are component devices, which can be queried for their corresponding composite device, and the composite device can in turn be queried for components. Dedicated composite device APIs will return non-trivial results.
+    * - ``COMPOSITE``
+      - Level Zero devices with multiple tiles will be exposed as a singular root device, with tiles accessible as sub-devices.
+    * - ``FLAT``
+      - Level Zero devices with multiple tiles will be exposed as a set of root devices, each corresponding to an individual tile. Enabled by default.
+
+Read more about device hierarchy in `Level Zero Specification <https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#device-hierarchy>`_ and `Intel GPU article <https://www.intel.com/content/www/us/en/developer/articles/technical/flattening-gpu-tile-hierarchy.html>`_.
+
+Variable ``ZE_AFFINITY_MASK``
+-------------------------------
+Allows users to mask specific devices from being used by SYCL applications.
+If we have ``ZE_FLAT_DEVICE_HIERARCHY`` set to ``COMPOSITE``, we can have an AFFINITY of “1” for our application to only see device #1 - making system devices 0, and 2+, invisible.
+
+If we have ``ZE_FLAT_DEVICE_HIERARCHY`` set to ``FLAT``, we can have a ``ZE_AFFINITY_MASK`` of “1” for our application to only see the second tile in the system as logical device #0.
+If the system has four dual-tile GPUs installed, this would be the second tile in the first GPU. In ``FLAT`` mode, the numbers use a system-wide-sub-device-number from a flat numbering perspective.
+Therefore, we could use the second tile in each of four dual-tile GPUs with ``ZE_AFFINITY_MASK=1,3,5,7``.
+
+| If we have ``ZE_FLAT_DEVICE_HIERARCHY`` set to ``COMBINED``, the way tiles and composite devices are exposed depends on the physical devices present and the value of ``ZE_AFFINITY_MASK``:
+| **If all exposed tiles (as determined by ``ZE_AFFINITY_MASK``) belong to the same physical device:**
+| - That composite device is available to the application, and each tile is accessible as a component device of that composite device.
+
+| **If the exposed tiles belong to different physical devices:**
+| - A composite device is available for each physical device, and the tiles are accessible as component devices of their respective composite device.
+
+Additional examples to illustrate this are in the detailed documentation for ``ZE_AFFINITY_MASK``, read more about it in `Level Zero Specification <https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#affinity-mask>`_.
+
+Variable ``ZE_ENABLE_PCI_ID_DEVICE_ORDER``
+-------------------------------
+Forces driver to report devices from lowest to highest PCI bus ID.
+
+.. list-table::
+    :header-rows: 1
+
+    * - Value
+      - Description
+    * - ``0``
+      - Disabled. Default value.
+    * - ``1``
+      - Enabled.
+
+Variable ``ZE_SHARED_FORCE_DEVICE_ALLOC``
+-------------------------------
+Forces all shared allocations into device memory
+
+.. list-table::
+    :header-rows: 1
+
+    * - Value
+      - Description
+    * - ``0``
+      - Disabled. Default value.
+    * - ``1``
+      - Enabled.
@@ -807,7 +807,9 @@ <h1>dpctl.get_composite_devices<a class="headerlink" href="#dpctl-get-composite-
 instances.</p>
 <p>Only available when <cite>ZE_FLAT_DEVICE_HIERARCHY=COMBINED</cite> is set in
 the environment, and only for specific Level Zero devices
-(i.e., those which expose multiple tiles as root devices).</p>
+(i.e., those which expose multiple tiles as root devices).
+To read more about <cite>ZE_FLAT_DEVICE_HIERARCHY=COMBINED</cite>,
+see <a class="reference internal" href="../../../user_guides/environment_variables.html#env-var-ze-flat-device-hierarchy"><span class="std std-ref">Variable ZE_FLAT_DEVICE_HIERARCHY</span></a>.</p>
 <p>For more information, see:
 <a class="reference external" href="https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_composite_device.asciidoc">https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_composite_device.asciidoc</a></p>
 <dl class="field-list simple">
 
@@ -808,8 +808,8 @@ <h1>dpctl.tensor.usm_ndarray.__dlpack_device__<a class="headerlink" href="#dpctl
 <p>The tuple describes the non-partitioned device where the array has been
 allocated, or the non-partitioned parent device of the allocation
 device.</p>
-<p>See <code class="docutils literal notranslate"><span class="pre">DLDeviceType</span></code> for a list of devices supported by the DLPack
-protocol.</p>
+<p>See <a class="reference internal" href="../../tensor.constants.html#dpctl.tensor.DLDeviceType" title="dpctl.tensor.DLDeviceType"><code class="xref py py-class docutils literal notranslate"><span class="pre">dpctl.tensor.DLDeviceType</span></code></a> for a list of devices supported
+by the DLPack protocol.</p>
 <dl class="field-list simple">
 <dt class="field-odd">Raises<span class="colon">:</span></dt>
 <dd class="field-odd"><p><strong>DLPackCreationError</strong> – when the <code class="docutils literal notranslate"><span class="pre">device_id</span></code> could not be determined.</p>
 
@@ -868,7 +868,7 @@ <h2>Installation using pip<a class="headerlink" href="#installation-using-pip" t
 <section id="installation-via-intel-r-distribution-for-python">
 <h2>Installation via Intel(R) Distribution for Python<a class="headerlink" href="#installation-via-intel-r-distribution-for-python" title="Permalink to this heading">¶</a></h2>
 <p><a class="reference external" href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html">Intel(R) Distribution for Python*</a> is distributed as a conda-based installer
-and includes <a class="reference internal" href="../api_reference/dpctl/index.html#module-dpctl" title="dpctl"><code class="xref py py-mod docutils literal notranslate"><span class="pre">dpctl</span></code></a> along with its dependencies and sister projects <a class="reference external" href="https://intelpython.github.io/dpnp/overview.html#module-dpnp" title="(in Data Parallel Extension for NumPy v0.19.0dev1+15.g876e9403a7e)"><code class="xref py py-mod docutils literal notranslate"><span class="pre">dpnp</span></code></a>
+and includes <a class="reference internal" href="../api_reference/dpctl/index.html#module-dpctl" title="dpctl"><code class="xref py py-mod docutils literal notranslate"><span class="pre">dpctl</span></code></a> along with its dependencies and sister projects <a class="reference external" href="https://intelpython.github.io/dpnp/overview.html#module-dpnp" title="(in Data Parallel Extension for NumPy v0.19.0dev3+26.gf244f40aede)"><code class="xref py py-mod docutils literal notranslate"><span class="pre">dpnp</span></code></a>
 and <a class="reference external" href="https://intelpython.github.io/numba-dpex/latest/index.html#module-numba_dpex" title="(in numba-dpex)"><code class="xref py py-mod docutils literal notranslate"><span class="pre">numba_dpex</span></code></a>.</p>
 <p>Once the installed environment is activated, <code class="docutils literal notranslate"><span class="pre">dpctl</span></code> should be ready to use.</p>
 </section>
@@ -938,12 +938,20 @@ <h3>Building for custom SYCL targets<a class="headerlink" href="#building-for-cu
 <a class="reference external" href="https://intel.github.io/llvm/UsersManual.html">DPC++ Compiler User Manual</a>.</p>
 <section id="cuda-build">
 <h4>CUDA build<a class="headerlink" href="#cuda-build" title="Permalink to this heading">¶</a></h4>
-<p><code class="docutils literal notranslate"><span class="pre">dpctl</span></code> can be built for CUDA devices using the <code class="docutils literal notranslate"><span class="pre">DPCTL_TARGET_CUDA</span></code> CMake option,
-which accepts a specific compute architecture string:</p>
+<p><code class="docutils literal notranslate"><span class="pre">dpctl</span></code> can be built for CUDA devices using the  <code class="docutils literal notranslate"><span class="pre">--target-cuda</span></code> argument.</p>
+<p>To target a specific architecture (e.g., <code class="docutils literal notranslate"><span class="pre">sm_80</span></code>):</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/build_locally.py<span class="w"> </span>--verbose<span class="w"> </span>--target-cuda<span class="o">=</span>sm_80
+</pre></div>
+</div>
+<p>To use the default architecture (<code class="docutils literal notranslate"><span class="pre">sm_50</span></code>), omit the value:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/build_locally.py<span class="w"> </span>--verbose<span class="w"> </span>--target-cuda
+</pre></div>
+</div>
+<p>Alternatively, you can use the <code class="docutils literal notranslate"><span class="pre">DPCTL_TARGET_CUDA</span></code> CMake option:</p>
 <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/build_locally.py<span class="w"> </span>--verbose<span class="w"> </span>--cmake-opts<span class="o">=</span><span class="s2">&quot;-DDPCTL_TARGET_CUDA=sm_80&quot;</span>
 </pre></div>
 </div>
-<p>To use the default architecture (<code class="docutils literal notranslate"><span class="pre">sm_50</span></code>),
+<p>To use the default architecture (<code class="docutils literal notranslate"><span class="pre">sm_50</span></code>) with CMake options,
 set <code class="docutils literal notranslate"><span class="pre">DPCTL_TARGET_CUDA</span></code> to a value such as <code class="docutils literal notranslate"><span class="pre">ON</span></code>, <code class="docutils literal notranslate"><span class="pre">TRUE</span></code>, <code class="docutils literal notranslate"><span class="pre">YES</span></code>, <code class="docutils literal notranslate"><span class="pre">Y</span></code>, or <code class="docutils literal notranslate"><span class="pre">1</span></code>:</p>
 <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/build_locally.py<span class="w"> </span>--verbose<span class="w"> </span>--cmake-opts<span class="o">=</span><span class="s2">&quot;-DDPCTL_TARGET_CUDA=ON&quot;</span>
 </pre></div>
@@ -958,26 +966,28 @@ <h4>CUDA build<a class="headerlink" href="#cuda-build" title="Permalink to this
 </section>
 <section id="amd-build">
 <h4>AMD build<a class="headerlink" href="#amd-build" title="Permalink to this heading">¶</a></h4>
-<p><code class="docutils literal notranslate"><span class="pre">dpctl</span></code> can be built for AMD devices using the <code class="docutils literal notranslate"><span class="pre">DPCTL_TARGET_HIP</span></code> CMake option,
-which requires specifying a compute architecture string:</p>
-<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/build_locally.py<span class="w"> </span>--verbose<span class="w"> </span>--cmake-opts<span class="o">=</span><span class="s2">&quot;-DDPCTL_TARGET_HIP=&lt;arch&gt;&quot;</span>
+<p><code class="docutils literal notranslate"><span class="pre">dpctl</span></code> can be built for AMD devices using the  <code class="docutils literal notranslate"><span class="pre">--target-hip</span></code> argument.</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/build_locally.py<span class="w"> </span>--verbose<span class="w"> </span>--target-hip<span class="o">=</span>&lt;arch&gt;
 </pre></div>
 </div>
 <p>Note that the <cite>oneAPI for AMD GPUs</cite> plugin requires the architecture be specified and only
 one architecture can be specified at a time.</p>
 <p>To determine the architecture code (<code class="docutils literal notranslate"><span class="pre">&lt;arch&gt;</span></code>) for your AMD GPU, run:</p>
 <p>This will print names like <code class="docutils literal notranslate"><span class="pre">gfx90a</span></code>, <code class="docutils literal notranslate"><span class="pre">gfx1030</span></code>, etc.
-You can then use one of them as the argument to <code class="docutils literal notranslate"><span class="pre">-DDPCTL_TARGET_HIP</span></code>.</p>
+You can then use one of them as the argument to <code class="docutils literal notranslate"><span class="pre">--target-hip</span></code>.</p>
 <p>For example:</p>
+<p>Alternatively, you can use the <code class="docutils literal notranslate"><span class="pre">DPCTL_TARGET_HIP</span></code> CMake option:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/build_locally.py<span class="w"> </span>--verbose<span class="w"> </span>--cmake-opts<span class="o">=</span><span class="s2">&quot;-DDPCTL_TARGET_HIP=gfx1030&quot;</span>
+</pre></div>
+</div>
 </section>
 <section id="multi-target-build">
 <h4>Multi-target build<a class="headerlink" href="#multi-target-build" title="Permalink to this heading">¶</a></h4>
 <p>The default <code class="docutils literal notranslate"><span class="pre">dpctl</span></code> build from the source enables support of Intel devices only.
 Extending the build with a custom SYCL target additionally enables support of CUDA or AMD
 device in <code class="docutils literal notranslate"><span class="pre">dpctl</span></code>. Besides, the support can be also extended to enable both CUDA and AMD
 devices at the same time:</p>
-<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/build_locally.py<span class="w"> </span>--verbose<span class="w"> </span>--cmake-opts<span class="o">=</span><span class="s2">&quot;-DDPCTL_TARGET_CUDA=ON \</span>
-<span class="s2">-DDPCTL_TARGET_HIP=gfx1030&quot;</span>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>scripts/build_locally.py<span class="w"> </span>--verbose<span class="w"> </span>--target-cuda<span class="w"> </span>--target-hip<span class="o">=</span>gfx1030
 </pre></div>
 </div>
 </section>