You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -50,3 +56,70 @@ The value of the variable is a bit-mask, with the following supported values:
50
56
- Enables tracing of PI calls
51
57
* - ``-1``
52
58
- Enables all levels of tracing
59
+
60
+
.. _env_var_ze_flat_device_hierarchy:
61
+
62
+
Variable ``ZE_FLAT_DEVICE_HIERARCHY``
63
+
--------------------------
64
+
Allows users to define the device hierarchy model exposed by Level Zero driver implementation.
65
+
Keep in mind :py:mod:`dpctl.get_composite_devices` will only work while this is set to ``COMBINED``.
66
+
67
+
.. list-table::
68
+
:header-rows: 1
69
+
70
+
* - Value
71
+
- Description
72
+
* - ``COMBINED``
73
+
- Level Zero devices with multiple tiles will be exposed as a set of root devices, each corresponding to an individual tile. These root devices are component devices, which can be queried for their corresponding composite device, and the composite device can in turn be queried for components. Dedicated composite device APIs will return non-trivial results.
74
+
* - ``COMPOSITE``
75
+
- Level Zero devices with multiple tiles will be exposed as a singular root device, with tiles accessible as sub-devices.
76
+
* - ``FLAT``
77
+
- Level Zero devices with multiple tiles will be exposed as a set of root devices, each corresponding to an individual tile. Enabled by default.
78
+
79
+
Read more about device hierarchy in `Level Zero Specification <https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#device-hierarchy>`_ and `Intel GPU article <https://www.intel.com/content/www/us/en/developer/articles/technical/flattening-gpu-tile-hierarchy.html>`_.
80
+
81
+
Variable ``ZE_AFFINITY_MASK``
82
+
-------------------------------
83
+
Allows users to mask specific devices from being used by SYCL applications.
84
+
If we have ``ZE_FLAT_DEVICE_HIERARCHY`` set to ``COMPOSITE``, we can have an AFFINITY of “1” for our application to only see device #1 - making system devices 0, and 2+, invisible.
85
+
86
+
If we have ``ZE_FLAT_DEVICE_HIERARCHY`` set to ``FLAT``, we can have a ``ZE_AFFINITY_MASK`` of “1” for our application to only see the second tile in the system as logical device #0.
87
+
If the system has four dual-tile GPUs installed, this would be the second tile in the first GPU. In ``FLAT`` mode, the numbers use a system-wide-sub-device-number from a flat numbering perspective.
88
+
Therefore, we could use the second tile in each of four dual-tile GPUs with ``ZE_AFFINITY_MASK=1,3,5,7``.
89
+
90
+
|If we have ``ZE_FLAT_DEVICE_HIERARCHY`` set to ``COMBINED``, the way tiles and composite devices are exposed depends on the physical devices present and the value of ``ZE_AFFINITY_MASK``:
91
+
|**If all exposed tiles (as determined by ``ZE_AFFINITY_MASK``) belong to the same physical device:**
92
+
|- That composite device is available to the application, and each tile is accessible as a component device of that composite device.
93
+
94
+
|**If the exposed tiles belong to different physical devices:**
95
+
|- A composite device is available for each physical device, and the tiles are accessible as component devices of their respective composite device.
96
+
97
+
Additional examples to illustrate this are in the detailed documentation for ``ZE_AFFINITY_MASK``, read more about it in `Level Zero Specification <https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#affinity-mask>`_.
98
+
99
+
Variable ``ZE_ENABLE_PCI_ID_DEVICE_ORDER``
100
+
-------------------------------
101
+
Forces driver to report devices from lowest to highest PCI bus ID.
<p>The tuple describes the non-partitioned device where the array has been
809
809
allocated, or the non-partitioned parent device of the allocation
810
810
device.</p>
811
-
<p>See <codeclass="docutils literal notranslate"><spanclass="pre">DLDeviceType</span></code> for a list of devices supported by the DLPack
812
-
protocol.</p>
811
+
<p>See <aclass="reference internal" href="../../tensor.constants.html#dpctl.tensor.DLDeviceType" title="dpctl.tensor.DLDeviceType"><codeclass="xref py py-class docutils literal notranslate"><spanclass="pre">dpctl.tensor.DLDeviceType</span></code></a> for a list of devices supported
<ddclass="field-odd"><p><strong>DLPackCreationError</strong> – when the <codeclass="docutils literal notranslate"><spanclass="pre">device_id</span></code> could not be determined.</p>
<h2>Installation via Intel(R) Distribution for Python<aclass="headerlink" href="#installation-via-intel-r-distribution-for-python" title="Permalink to this heading">¶</a></h2>
870
870
<p><aclass="reference external" href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html">Intel(R) Distribution for Python*</a> is distributed as a conda-based installer
871
-
and includes <aclass="reference internal" href="../api_reference/dpctl/index.html#module-dpctl" title="dpctl"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">dpctl</span></code></a> along with its dependencies and sister projects <aclass="reference external" href="https://intelpython.github.io/dpnp/overview.html#module-dpnp" title="(in Data Parallel Extension for NumPy v0.19.0dev1+15.g876e9403a7e)"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">dpnp</span></code></a>
871
+
and includes <aclass="reference internal" href="../api_reference/dpctl/index.html#module-dpctl" title="dpctl"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">dpctl</span></code></a> along with its dependencies and sister projects <aclass="reference external" href="https://intelpython.github.io/dpnp/overview.html#module-dpnp" title="(in Data Parallel Extension for NumPy v0.19.0dev3+26.gf244f40aede)"><codeclass="xref py py-mod docutils literal notranslate"><spanclass="pre">dpnp</span></code></a>
<p>Once the installed environment is activated, <codeclass="docutils literal notranslate"><spanclass="pre">dpctl</span></code> should be ready to use.</p>
<aclass="reference external" href="https://intel.github.io/llvm/UsersManual.html">DPC++ Compiler User Manual</a>.</p>
939
939
<sectionid="cuda-build">
940
940
<h4>CUDA build<aclass="headerlink" href="#cuda-build" title="Permalink to this heading">¶</a></h4>
941
-
<p><codeclass="docutils literal notranslate"><spanclass="pre">dpctl</span></code> can be built for CUDA devices using the <codeclass="docutils literal notranslate"><spanclass="pre">DPCTL_TARGET_CUDA</span></code> CMake option,
942
-
which accepts a specific compute architecture string:</p>
941
+
<p><codeclass="docutils literal notranslate"><spanclass="pre">dpctl</span></code> can be built for CUDA devices using the <codeclass="docutils literal notranslate"><spanclass="pre">--target-cuda</span></code> argument.</p>
942
+
<p>To target a specific architecture (e.g., <codeclass="docutils literal notranslate"><spanclass="pre">sm_80</span></code>):</p>
<p>To use the default architecture (<codeclass="docutils literal notranslate"><spanclass="pre">sm_50</span></code>),
954
+
<p>To use the default architecture (<codeclass="docutils literal notranslate"><spanclass="pre">sm_50</span></code>) with CMake options,
947
955
set <codeclass="docutils literal notranslate"><spanclass="pre">DPCTL_TARGET_CUDA</span></code> to a value such as <codeclass="docutils literal notranslate"><spanclass="pre">ON</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">TRUE</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">YES</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">Y</span></code>, or <codeclass="docutils literal notranslate"><spanclass="pre">1</span></code>:</p>
@@ -958,26 +966,28 @@ <h4>CUDA build<a class="headerlink" href="#cuda-build" title="Permalink to this
958
966
</section>
959
967
<sectionid="amd-build">
960
968
<h4>AMD build<aclass="headerlink" href="#amd-build" title="Permalink to this heading">¶</a></h4>
961
-
<p><codeclass="docutils literal notranslate"><spanclass="pre">dpctl</span></code> can be built for AMD devices using the <codeclass="docutils literal notranslate"><spanclass="pre">DPCTL_TARGET_HIP</span></code> CMake option,
962
-
which requires specifying a compute architecture string:</p>
<p><codeclass="docutils literal notranslate"><spanclass="pre">dpctl</span></code> can be built for AMD devices using the <codeclass="docutils literal notranslate"><spanclass="pre">--target-hip</span></code> argument.</p>
<p>Note that the <cite>oneAPI for AMD GPUs</cite> plugin requires the architecture be specified and only
967
974
one architecture can be specified at a time.</p>
968
975
<p>To determine the architecture code (<codeclass="docutils literal notranslate"><spanclass="pre"><arch></span></code>) for your AMD GPU, run:</p>
969
976
<p>This will print names like <codeclass="docutils literal notranslate"><spanclass="pre">gfx90a</span></code>, <codeclass="docutils literal notranslate"><spanclass="pre">gfx1030</span></code>, etc.
970
-
You can then use one of them as the argument to <codeclass="docutils literal notranslate"><spanclass="pre">-DDPCTL_TARGET_HIP</span></code>.</p>
977
+
You can then use one of them as the argument to <codeclass="docutils literal notranslate"><spanclass="pre">--target-hip</span></code>.</p>
971
978
<p>For example:</p>
979
+
<p>Alternatively, you can use the <codeclass="docutils literal notranslate"><spanclass="pre">DPCTL_TARGET_HIP</span></code> CMake option:</p>
<h4>Multi-target build<aclass="headerlink" href="#multi-target-build" title="Permalink to this heading">¶</a></h4>
975
986
<p>The default <codeclass="docutils literal notranslate"><spanclass="pre">dpctl</span></code> build from the source enables support of Intel devices only.
976
987
Extending the build with a custom SYCL target additionally enables support of CUDA or AMD
977
988
device in <codeclass="docutils literal notranslate"><spanclass="pre">dpctl</span></code>. Besides, the support can be also extended to enable both CUDA and AMD
0 commit comments