Skip to content

Issue with CUDA 12.x Inference, Performance completely obliterated #71

@gilljon

Description

@gilljon

Steps to reproduce:

pip install span-marker
>>> m_cuda = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super").cuda()
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.75k/6.75k [00:00<00:00, 50.7MB/s]
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.42G/1.42G [00:21<00:00, 65.9MB/s]
>>> m_cuda.device
device(type='cuda', index=0)
>>> m_cuda.predict("John Smith works at Amazon.")
[]
>>> m_cpu = m_cuda.to("cpu")
>>> m_cpu.predict("John Smith works at Amazon.")
SpanMarker model predictions are being computed on the CPU while CUDA is available. Moving the model to CUDA using `model.cuda()` before performing predictions is heavily recommended to significantly boost prediction speeds.
[{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197737574577332, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]

CPU inference yields expected results but CUDA is returning empty for short texts... Not a problem if we have longer texts... why is this?

Output from nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

This is running on an A10 but we observe the same results on a T4... Is SpanMarker not compatible with cu12x?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions