-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
Hi~all, a strange illegal memory access problem happened when I execute
trtexec --onnx=model.onnx --int8 --calib=model_calibration.cache
on a multi-GPUs plateform, and the errors shows
[04/18/2022-14:24:28] [I] [TRT] Starting Calibration.
[04/18/2022-14:24:28] [E] Error[1]: [calibrator.cpp::add::779] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[04/18/2022-14:24:28] [E] Error[1]: [executionContext.cpp::commonEmitDebugTensor::1258] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[04/18/2022-14:24:28] [E] Error[1]: [convolutionRunner.cpp::executeConv::458] Error Code 1: Cudnn (CUDNN_STATUS_BAD_PARAM)
[04/18/2022-14:24:28] [F] [TRT] [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[04/18/2022-14:24:28] [F] [TRT] [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[04/18/2022-14:24:28] [F] [TRT] [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[04/18/2022-14:24:28] [F] [TRT] [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[04/18/2022-14:24:28] [F] [TRT] [resources.h::operator()::445] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[04/18/2022-14:24:28] [F] [TRT] [resources.h::operator()::445] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[04/18/2022-14:24:28] [F] [TRT] [resources.h::operator()::445] Error Code 1: Cuda Driver (an illegal memory access was encountered)
[04/18/2022-14:24:28] [F] [TRT] [resources.h::operator()::445] Error Code 1: Cuda Driver (an illegal memory access was encountered)
But it runs correctly on a single-GPU plateform.
I guess that calibration_cache and model_weights are allocated on different devices, But how to specify the same one. Neither "--device" or "CUDA_VISIBLE_DEVICES=0" would work.
Metadata
Metadata
Assignees
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers