-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Description
Hardware:H800
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0
model-analyzer --version
1.47.0
# baseimage
nvcr.io/nvidia/tritonserver:24.11-py3
When using the entire card, there is no problem. However, after enabling the MIG mode, when the container is on the MIG card, model_analyzer cannot be executed.
docker run -ti --rm --gpus='"device=0:0,0:1"' --network=host -v $PWD:/mnt --name triton-server tritonserver-modelanalyzer:latest
model-analyzer profile \
--model-repository=/mnt/models \
--profile-models=densenet_onnx \
--output-model-repository-path=results
[Model Analyzer] Initializing GPUDevice handles
CacheManager Init Failed. Error: -17
Traceback (most recent call last):
File "/usr/local/bin/model-analyzer", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/model_analyzer/entrypoint.py", line 263, in main
gpus = GPUDeviceFactory().verify_requested_gpus(config.gpus)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/model_analyzer/device/gpu_device_factory.py", line 39, in __init__
self.init_all_devices()
File "/usr/local/lib/python3.12/dist-packages/model_analyzer/device/gpu_device_factory.py", line 58, in init_all_devices
dcgm_handle = dcgm_agent.dcgmStartEmbedded(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/model_analyzer/monitor/dcgm/dcgm_agent.py", line 56, in wrapper
return fn(*newargs, **newkwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/model_analyzer/monitor/dcgm/dcgm_agent.py", line 91, in dcgmStartEmbedded
dcgm_structs._dcgmCheckReturn(ret)
File "/usr/local/lib/python3.12/dist-packages/model_analyzer/monitor/dcgm/dcgm_structs.py", line 691, in _dcgmCheckReturn
raise DCGMError(ret)
model_analyzer.monitor.dcgm.dcgm_structs.DCGMError_InitError: DCGM initialization error
Metadata
Metadata
Assignees
Labels
No labels