Compatibility with RTX 3080

OS: Win 10
DeepLabCut Version: DeepLabCut-core tf 2.2 alpha
Anaconda env used: DLC-GPU (clone the DLC-GPU env and uninstall the CUDA and cudnn)
Tensorflow Version:  TF2.3, TF2.4, or tf-nightly, installed with pip (see below)
Cuda version: 11.0 and 11.1 (see below)

Hi everyone,
First of all, I want to say thank you to the deeplabcut team! I have been using the DLC for whisker tracking on an RTX 2060 for a while and it significantly facilitates my project. 
Recently, I got an RTX 3080 in the lab. However, I had a hard time setting it up for DLC due to the compatibility issue. First, I noticed that RTX 3000 series does not support CUDA 10.x or earlier versions, so I installed CUDA 11.0 or CUDA 11.1 with the coresponding CuDNN on my windows. And I also cloned DLC-GPU conda environment and uninstalled the original CUDA and cudnn in the environment to prevent conflict.
TensorFlow starts to support CUDA 11.0 from TensorFlow 2.4, so I installed the TensorFlow 2.4 or tf-nightly-2.5 in the conda environment (via pip). I also tried TF-2.3 to check whether TF-2.3 is indeed incompatible with CUDA 11.x. I followed the 
https://github.com/DeepLabCut/DeepLabCut-core/blob/tf2.2alpha/Colab_TrainNetwork_VideoAnalysis_TF2.ipynb
to install DeepLabCut-core tf 2.2 alpha and tf-slim and run the deeplabcut-core. However, I could not get it to start training in any of the settings.
Here is the summary
CUDA 11.0 | TF-2.3 | TF cannot recognize GPU as it is looking for .dll files that only exist in CUDA10.x 
CUDA 11.0 | TF-2.4 | TF can recognize GPU smoothly, cannot start training with an error message (see Notes 1) 
CUDA 11.0 | TF-nightly | TF can recognize GPU smoothly, cannot start training with an error message (see Notes  1) 
CUDA 11.1 | TF-2.4| TF can recognize GPU with a trick (see Notes  2), cannot start training with no error message 
CUDA 11.1 | TF-nightly | TF can recognize GPU with a trick (see Notes  2), cannot start training with no error message 
I tested some simple TensorFlow script (https://www.tensorflow.org/tutorials/quickstart/advanced), they seemed to work fine on GPU in the last 4 configurations that I listed above.

Notes 1: Error message: failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED.  And I saw the VRAM exploded in Windows Task manager after I started training. I tried to restrict the memory to a lower use by "config.gpu_options.per_process_gpu_memory_fraction = 0.6". It did not help, unfortunately.

Notes  2: TF could not recognize GPU because it could find "cusolver64_10.dll" which exists in CUDA 11.0 but replaced by "cusolver64_11.dll" in CUDA 11.1. So I copied "cusolver64_11.dll" and renamed it as "cusolver64_10.dll". Although TF can recognize GPU after that, it cannot start training. I saw the VRAM usage increased (but did not explode) in task manager after training start and after ~ 30 seconds, ipython or python just closed itself without any error message.

I also carefully followed the suggestions in https://github.com/DeepLabCut/DeepLabCut/issues/944. They are very useful suggestions. However, I still cannot get my RTX3080 work.

Do you have any more suggestions that I could try? 
Does anyone have a guide to set DLC-Core on RTX 3000 Series?

Thank you in advance








Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compatibility with RTX 3080 #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compatibility with RTX 3080 #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions