You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: recipes_source/xeon_run_cpu.rst
+7-8Lines changed: 7 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -214,12 +214,12 @@ The generic option settings (knobs) include the following:
214
214
- default value
215
215
- help
216
216
* - ``-h``, ``--help``
217
-
-
218
-
-
217
+
-
218
+
-
219
219
- To show the help message and exit.
220
220
* - ``-m``, ``--module``
221
-
-
222
-
-
221
+
-
222
+
-
223
223
- To change each process to interpret the launch script as a python module, executing with the same behavior as "python -m".
224
224
* - ``--no-python``
225
225
- bool
@@ -323,7 +323,7 @@ Knobs for controlling instance number and compute resource allocation are:
323
323
- bool
324
324
- False
325
325
- To disable the usage of ``taskset`` command.
326
-
326
+
327
327
.. note::
328
328
329
329
Environment variables that will be set by this script include the following:
@@ -344,13 +344,13 @@ Knobs for controlling instance number and compute resource allocation are:
344
344
- Value of ``ncores_per_instance``
345
345
* - MALLOC_CONF
346
346
- If libjemalloc.so is preloaded, MALLOC_CONF will be set to ``"oversize_threshold:1,background_thread:true,metadata_thp:auto"``.
347
-
347
+
348
348
Please note that the script respects environment variables set preliminarily. For example, if you have set the environment variables mentioned above before running the script, the values of the variables will not be overwritten by the script.
349
349
350
350
Conclusion
351
351
----------
352
352
353
-
In this tutorial, we explored a variety of advanced configurations and tools designed to optimize PyTorch inference performance on Intel® Xeon® Scalable Processors.
353
+
In this tutorial, we explored a variety of advanced configurations and tools designed to optimize PyTorch inference performance on Intel® Xeon® Scalable Processors.
354
354
By leveraging the ``torch.backends.xeon.run_cpu`` script, we demonstrated how to fine-tune thread and memory management to achieve peak performance.
355
355
We covered essential concepts such as NUMA access control, optimized memory allocators like ``TCMalloc`` and ``JeMalloc``, and the use of Intel® OpenMP for efficient multithreading.
0 commit comments