@@ -34,3 +34,37 @@ Plugin Library
34
34
In the case you use Torch-TensorRT as a converter to a TensorRT engine and your engine uses plugins provided by Torch-TensorRT, Torch-TensorRT
35
35
ships the library ``libtorchtrt_plugins.so `` which contains the implementation of the TensorRT plugins used by Torch-TensorRT during
36
36
compilation. This library can be ``DL_OPEN `` or ``LD_PRELOAD `` similar to other TensorRT plugin libraries.
37
+
38
+ Multi Device Safe Mode
39
+ ---------------
40
+
41
+ Multi-device safe mode is a setting in Torch-TensorRT which allows the user to determine whether
42
+ the runtime checks for device consistency prior to every inference call.
43
+
44
+ There is a non-negligible, fixed cost per-inference call when multi-device safe mode is enabled, which is why
45
+ it is now disabled by default. It can be controlled via the following convenience function which
46
+ doubles as a context manager.
47
+
48
+ .. code-block :: python
49
+
50
+ # Enables Multi Device Safe Mode
51
+ torch_tensorrt.runtime.set_multi_device_safe_mode(True )
52
+
53
+ # Disables Multi Device Safe Mode [Default Behavior]
54
+ torch_tensorrt.runtime.set_multi_device_safe_mode(False )
55
+
56
+ # Enables Multi Device Safe Mode, then resets the safe mode to its prior setting
57
+ with torch_tensorrt.runtime.set_multi_device_safe_mode(True ):
58
+ ...
59
+
60
+ TensorRT requires that each engine be associated with the CUDA context in the active thread from which it is invoked.
61
+ Therefore, if the device were to change in the active thread, which may be the case when invoking
62
+ engines on multiple GPUs from the same Python process, safe mode will cause Torch-TensorRT to display
63
+ an alert and switch GPUs accordingly. If safe mode were not enabled, there could be a mismatch in the engine
64
+ device and CUDA context device, which could lead the program to crash.
65
+
66
+ One technique for managing multiple TRT engines on different GPUs while not sacrificing performance for
67
+ multi-device safe mode is to use Python threads. Each thread is responsible for all of the TRT engines
68
+ on a single GPU, and the default CUDA device on each thread corresponds to the GPU for which it is
69
+ responsible (can be set via ``torch.cuda.set_device(...) ``). In this way, multiple threads can be used in the same
70
+ Python script without needing to switch CUDA contexts and incur performance overhead.
0 commit comments