Skip to content
This repository was archived by the owner on Jul 1, 2021. It is now read-only.

Compatibility with RTX 3080Β #13

@PlatinumYao

Description

@PlatinumYao

OS: Win 10
DeepLabCut Version: DeepLabCut-core tf 2.2 alpha
Anaconda env used: DLC-GPU (clone the DLC-GPU env and uninstall the CUDA and cudnn)
Tensorflow Version: TF2.3, TF2.4, or tf-nightly, installed with pip (see below)
Cuda version: 11.0 and 11.1 (see below)

Hi everyone,
First of all, I want to say thank you to the deeplabcut team! I have been using the DLC for whisker tracking on an RTX 2060 for a while and it significantly facilitates my project.
Recently, I got an RTX 3080 in the lab. However, I had a hard time setting it up for DLC due to the compatibility issue. First, I noticed that RTX 3000 series does not support CUDA 10.x or earlier versions, so I installed CUDA 11.0 or CUDA 11.1 with the coresponding CuDNN on my windows. And I also cloned DLC-GPU conda environment and uninstalled the original CUDA and cudnn in the environment to prevent conflict.
TensorFlow starts to support CUDA 11.0 from TensorFlow 2.4, so I installed the TensorFlow 2.4 or tf-nightly-2.5 in the conda environment (via pip). I also tried TF-2.3 to check whether TF-2.3 is indeed incompatible with CUDA 11.x. I followed the
https://github.com/DeepLabCut/DeepLabCut-core/blob/tf2.2alpha/Colab_TrainNetwork_VideoAnalysis_TF2.ipynb
to install DeepLabCut-core tf 2.2 alpha and tf-slim and run the deeplabcut-core. However, I could not get it to start training in any of the settings.
Here is the summary
CUDA 11.0 | TF-2.3 | TF cannot recognize GPU as it is looking for .dll files that only exist in CUDA10.x
CUDA 11.0 | TF-2.4 | TF can recognize GPU smoothly, cannot start training with an error message (see Notes 1)
CUDA 11.0 | TF-nightly | TF can recognize GPU smoothly, cannot start training with an error message (see Notes 1)
CUDA 11.1 | TF-2.4| TF can recognize GPU with a trick (see Notes 2), cannot start training with no error message
CUDA 11.1 | TF-nightly | TF can recognize GPU with a trick (see Notes 2), cannot start training with no error message
I tested some simple TensorFlow script (https://www.tensorflow.org/tutorials/quickstart/advanced), they seemed to work fine on GPU in the last 4 configurations that I listed above.

Notes 1: Error message: failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED. And I saw the VRAM exploded in Windows Task manager after I started training. I tried to restrict the memory to a lower use by "config.gpu_options.per_process_gpu_memory_fraction = 0.6". It did not help, unfortunately.

Notes 2: TF could not recognize GPU because it could find "cusolver64_10.dll" which exists in CUDA 11.0 but replaced by "cusolver64_11.dll" in CUDA 11.1. So I copied "cusolver64_11.dll" and renamed it as "cusolver64_10.dll". Although TF can recognize GPU after that, it cannot start training. I saw the VRAM usage increased (but did not explode) in task manager after training start and after ~ 30 seconds, ipython or python just closed itself without any error message.

I also carefully followed the suggestions in DeepLabCut/DeepLabCut#944. They are very useful suggestions. However, I still cannot get my RTX3080 work.

Do you have any more suggestions that I could try?
Does anyone have a guide to set DLC-Core on RTX 3000 Series?

Thank you in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions