Spacy training on GPU - CPU Utilization is at 100% and GPU Utilization is very low #11550

raghulrider · 2022-09-28T10:15:59Z

raghulrider
Sep 28, 2022

I'm trying to run a training job in Amazon Sagemaker with the p3 instance type . Whenever I start the training job the CPU Utilization goes to 100% and GPU utilization barely reaches 30%. Whatever I tried, I couldn't take advantage of the GPU to 100%.

I created the docker image with the following configuration

HW Configuration:

Instance type: ml.p3.2xlarge
GPU: Tesla V100
GPU Memory: 16 GB
CPU Cores: 8
Memory: 61 GB

Docker configuration:

Base image : nvidia/cuda:11.6.2-cudnn8-devel-ubuntu18.04
Python version: 3.9

Packages installed via pip:

torch==1.12.1+cu116
spacy-transformers
spacy-lookups-data
cupy-cuda116
chainer
'spacy[cuda116]'

Training is done using spacy train cli

`
from spacy.cli.train import train

overrides = {
"paths.train": os.path.join(train_path, "train.spacy"),
"paths.dev": os.path.join(dev_path, "dev.spacy"),
}

use_gpu: int = 0 if env.num_gpus > 0 else -1 #returns 0 if it has gpu else -1

train( config_path=Path(config_path),
output_path=Path(unpacked_model_path),
use_gpu=use_gpu,
overrides=overrides )
`

The config file I used is

config.txt

Spacy is using GPU, Screenshot is attached below

The instance metric screenshot taken during the training is attached below

Even though spacy is using GPU, the CPU usage is at 100% but the GPU usage is barely reaching 20%. Please help me figure out what I'm doing wrong.

Answered by adrianeboyd

Sep 29, 2022

Typically the first thing to try is to adjust the batch sizes. It's basically the opposite advice as for OOM errors: you can raise the batch sizes until you run into issues with your data. See links with more details under "I'm getting Out of Memory errors" in this FAQ: #8226

View full answer

adrianeboyd · 2022-09-29T15:44:00Z

adrianeboyd
Sep 29, 2022

Typically the first thing to try is to adjust the batch sizes. It's basically the opposite advice as for OOM errors: you can raise the batch sizes until you run into issues with your data. See links with more details under "I'm getting Out of Memory errors" in this FAQ: #8226

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Spacy training on GPU - CPU Utilization is at 100% and GPU Utilization is very low #11550

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Spacy training on GPU - CPU Utilization is at 100% and GPU Utilization is very low #11550

Uh oh!

Uh oh!

raghulrider Sep 28, 2022

HW Configuration:

Docker configuration:

Packages installed via pip:

Training is done using spacy train cli

The config file I used is

Spacy is using GPU, Screenshot is attached below

The instance metric screenshot taken during the training is attached below

Replies: 1 comment

Uh oh!

adrianeboyd Sep 29, 2022

raghulrider
Sep 28, 2022

adrianeboyd
Sep 29, 2022