Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck

I want to use CUDA instead of CPU to increase the speed on tag inference.

My machine Ubuntu 22.04.3 LTS (GNU/Linux 6.5.0-35-generic x86_64), CUDA 12.2

I learned from https://onnxruntime.ai/docs/install/ that if you have cuda 12 must install using `pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
` as of time of writing, instead of simply `pip install onnxruntime-gpu` which is for cuda 11. This took me a while to figure out. Kept getting errors that didn't make sense:


```
[E:onnxruntime
, provider_bridge_ort.cc:1744 TryGetProviderInfo_CUDA] /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1426 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

[W:onnxruntime, onnxruntime_pybind_state.cc:870 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirementsto ensure all dependencies are met.
```

I had those objects. but after reading carefully and reinstalling based on the above for cuda 12 it worked. Using CUDAExecutionprovider instead of CPUExecutionprovider however did cause a new warning:

`[W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 12 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message.`

Basically bottlenecked by CPU/GPU data transfer. Trying to figure out but have not been able to successfully.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck #62

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Trying to increase speed by using CUDAExecutionprovider from onnxruntime-gpu instead of cpu onnxruntime, met with warning about cpu-gpu transfer bottleneck #62

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions