Preventing CUDA Out of Memory #10664

cgpeltier · 2022-04-15T19:07:01Z

cgpeltier
Apr 15, 2022

Hi all,

Like a lot of the discussions/issues here, I've been dealing with CUDA OOM errors when fine tuning my NER model. For example:

Aborting and saving the final best model. Encountered exception:
RuntimeError('CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB
total capacity; 14.65 GiB already allocated; 2.00 MiB free; 14.71 GiB reserved
in total by PyTorch)',)

Some details:

Training on an AWS Sagemaker Studio ml.p3.8xlarge instance, which has 16GB mem
CUDA 11.1, PyTorch 1.8, Python 3.6
My train set has ~188000 docs. The longest docs have ~500 tokens (tokenizing with DistilBert) and 2200 characters.
The train set.spacy file is about 30MB and the eval set is 4.2 MB (in case that's helpful)

In the config file, if I set a max_epochs in [training], then I'm not able to get to a single eval step before running out of memory. If I stream the data in by setting max_epochs to -1 then I can get through ~4 steps (with an eval_frequency of 200) before running OOM.

I've tried adjusting a wide variety of settings in the config file, including:

[nlp] batch_size = 8
[components.transformer] max_batch_items = 25
[training.batcher] size = 25

I thought about setting doc._.trf_data = None as recommended here and here. However, I wasn't 100% clear on how to implement this. I created a custom_component.py file that includes

from spacy.language import Language
import spacy
nlp = spacy.blank("en")
@Language.component("remove_trf_data")
def remove_trf_data(doc):
    doc._.trf_data = None
    return doc
nlp.add_pipe("remove_trf_data")

and run train with --code custom_component.py, but think I need to edit my config file somewhere as well? I tried adding it to [nlp] pipeline with ["transformer", "ner", "remove_trf_data"] but this produced an error saying my pipeline was incorrect.

Finally, I created my train set .spacy files with the following, which takes a Pandas DF's 'spacy' col:

db = DocBin()
for text, annotations in tqdm(train_annotations_oversampled['spacy'].tolist()):
    example = Example.from_dict(nlp.make_doc(text), annotations)
    db.add(example.reference)
db.to_disk(r"file\path\train_annotatons.spacy")

Here's my full config file:

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer", "ner"]
batch_size = 8
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
factory = "transformer"
max_batch_items = 25
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "distilbert-base-uncased"
mixed_precision = false

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.grad_scaler_config]

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[components.custom]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = -1
max_steps = 20000
eval_frequency = 800
frozen_components = []
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 25
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

Answered by polm

Apr 18, 2022

and run train with --code custom_component.py, but think I need to edit my config file somewhere as well? I tried adding it to [nlp] pipeline with ["transformer", "ner", "remove_trf_data"] but this produced an error saying my pipeline was incorrect.

You are correct you need to edit your config to add the component, and that pipeline description looks fine.

Your custom code file is not correct, you shouldn't be creating a pipeline there - remove the lines with "nlp" on them. You just need to declare the component, which in this case is basically just a function that takes in a Doc, modifies it, and returns the modified Doc.

After you fix that you should be able to add the component to yo…

View full answer

polm · 2022-04-18T05:15:08Z

polm
Apr 18, 2022

and run train with --code custom_component.py, but think I need to edit my config file somewhere as well? I tried adding it to [nlp] pipeline with ["transformer", "ner", "remove_trf_data"] but this produced an error saying my pipeline was incorrect.

You are correct you need to edit your config to add the component, and that pipeline description looks fine.

Your custom code file is not correct, you shouldn't be creating a pipeline there - remove the lines with "nlp" on them. You just need to declare the component, which in this case is basically just a function that takes in a Doc, modifies it, and returns the modified Doc.

After you fix that you should be able to add the component to your pipeline. If you get an error doing so please share it.

9 replies

adrianeboyd Apr 21, 2022

You shouldn't need to write a custom component to remove doc._.trf_data. There's a built-in doc_cleaner component that can do this, which was added for users who were running into similar problems while training: https://spacy.io/api/pipeline-functions/#doc_cleaner

If you've checked that you're removing trf_data at the end of the pipeline and you've already tried reducing both the eval and train batch sizes, I think the next step would be to start testing with much smaller train and dev sets. If you're using the default corpus reader, you can use limit in the corpus setting to limit the number of examples. My initial guess would be that it's running out of memory during the eval step because this sounds like a relatively large dev corpus, so I'd suggest starting with a small dev corpus size and slowly increasing if training seems to proceed as expected through more than one epoch.

And Paul already mentioned this, but one unusually long train or dev doc can also cause problems, in particular with parser or ner. (There will some improvements related to this in the upcoming v3.3.)

cgpeltier Apr 21, 2022
Author

Thanks, Adriane and Paul! I'll look into using doc_cleaner and limit as my next step.

Also, not sure how much this might affect things, but I have 85k unique entity labels in my train set. I added label data files using init labels to hopefully mitigate against issues from the high number of entity types, but wondering if that may still be causing issues.

cgpeltier Apr 21, 2022
Author

Still hitting that error even after adding the doc_cleaner and setting eval limit to 300.

Also, my baseline eval set has 4380 docs, and the longest doc has 292 tokens and just 1503 characters.

Here's the full error:

[2022-04-21 17:15:59,662] [INFO] Initialized pipeline components: ['transformer', 'ner', 'doc_cleaner']
INFO:spacy:Initialized pipeline components: ['transformer', 'ner', 'doc_cleaner']
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'ner', 'doc_cleaner']
ℹ Initial learn rate: 0.0
E    #       LOSS TRANS...  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  -------------  --------  ------  ------  ------  ------
  0       0         265.36     63.62    0.00    0.00    0.00    0.00
  0     400       69801.67   6714.73    3.55   15.38    2.01    0.04
  0     800        8363.78   2709.69   38.02   72.64   25.75    0.38
  0    1200        3238.03   2275.32   39.13   61.96   28.60    0.39
⚠ Aborting and saving the final best model. Encountered exception:
RuntimeError('CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 14.76 GiB
total capacity; 13.61 GiB already allocated; 77.75 MiB free; 13.83 GiB reserved
in total by PyTorch)',)
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.6/site-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/opt/conda/lib/python3.6/site-packages/spacy/cli/_util.py", line 71, in setup_cli
    command(prog_name=COMMAND)
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.6/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/typer/main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.6/site-packages/spacy/cli/train.py", line 45, in train_cli
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
  File "/opt/conda/lib/python3.6/site-packages/spacy/cli/train.py", line 75, in train
    train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
  File "/opt/conda/lib/python3.6/site-packages/spacy/training/loop.py", line 122, in train
    raise e
  File "/opt/conda/lib/python3.6/site-packages/spacy/training/loop.py", line 105, in train
    for batch, info, is_best_checkpoint in training_step_iterator:
  File "/opt/conda/lib/python3.6/site-packages/spacy/training/loop.py", line 209, in train_while_improving
    annotates=annotating_components,
  File "/opt/conda/lib/python3.6/site-packages/spacy/language.py", line 1156, in update
    proc.update(examples, sgd=None, losses=losses, **component_cfg[name])  # type: ignore
  File "spacy/pipeline/transition_parser.pyx", line 419, in spacy.pipeline.transition_parser.Parser.update
  File "spacy/ml/parser_model.pyx", line 300, in spacy.ml.parser_model.ParserStepModel.finish_steps
  File "/opt/conda/lib/python3.6/site-packages/thinc/layers/chain.py", line 60, in backprop
    dX = callback(dY)
  File "/opt/conda/lib/python3.6/site-packages/thinc/layers/chain.py", line 60, in backprop
    dX = callback(dY)
  File "/opt/conda/lib/python3.6/site-packages/spacy_transformers/layers/listener.py", line 39, in backprop_and_clear
    result = self._backprop(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/spacy_transformers/pipeline_component.py", line 320, in backprop
    d_docs = bp_trf_full(d_trf_full)
  File "/opt/conda/lib/python3.6/site-packages/spacy_transformers/layers/transformer_model.py", line 200, in backprop_transformer
    _ = bp_tensors(d_output.model_output)
  File "/opt/conda/lib/python3.6/site-packages/thinc/layers/pytorchwrapper.py", line 139, in backprop
    dXtorch = torch_backprop(dYtorch)
  File "/opt/conda/lib/python3.6/site-packages/thinc/shims/pytorch.py", line 103, in backprop
    torch.autograd.backward(*grads.args, **grads.kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 14.76 GiB total capacity; 13.61 GiB already allocated; 77.75 MiB free; 13.83 GiB reserved in total by PyTorch)

adrianeboyd Apr 21, 2022

Yeah, we've never tested with anywhere near that many labels, and it's probably too many labels to be practical with any spacy components. The typical use cases for NER are for 5-20 labels. For textcat we have other suggestions for cases with a large number of labels, but I don't know of one for NER off the top of my head.

cgpeltier Apr 21, 2022
Author

That makes sense!

The way my data is organized is that we have ~85k entities with unique IDs. These entities are grouped into 6 categories/types. Our goal is to train two models, one for entity types and the other for the individual entity IDs. (The reason for the entity ID model is that it's important for us to know that "George Washington University", "GWU", and "The George Washington University" all refer to the same entity instead of just identifying each of these mentions as a "College", for example.)

We've already trained a model to identify the 6 entity types (using the same training data as for the entity ID model, but just swapping out the labels), and we've also trained an entity ID model (with the 85k labels) on the CPU with tok2vec.

I might try just training the entity ID model in Transformers directly to see if that works any differently and then bring it back into spacy afterwards, because those two models (entity type and ID) will be combined with an entity ruler in a pipeline for inference. Thanks again for your help on this.

Uh oh!

Preventing CUDA Out of Memory #10664

Uh oh!

Uh oh!

cgpeltier Apr 15, 2022

Replies: 1 comment · 9 replies

Uh oh!

polm Apr 18, 2022

Uh oh!

adrianeboyd Apr 21, 2022

Uh oh!

Uh oh!

cgpeltier Apr 21, 2022 Author

Uh oh!

cgpeltier Apr 21, 2022 Author

Uh oh!

adrianeboyd Apr 21, 2022

Uh oh!

Uh oh!

cgpeltier Apr 21, 2022 Author

cgpeltier
Apr 15, 2022

Replies: 1 comment 9 replies

polm
Apr 18, 2022

cgpeltier Apr 21, 2022
Author

cgpeltier Apr 21, 2022
Author

cgpeltier Apr 21, 2022
Author