How to improve GPU memory usage of SpanCat #12521

delucca · 2023-04-11T22:31:31Z

delucca
Apr 11, 2023

Hey everyone.

I'm coding a SpanCat model using a large corpus (the train Spacy data has ~100mb) and, for some reason, I'm not being able to launch the training on my GPU.

I'm using a VM hosted on GCP. The VM has an A100 with 40GB of RAM.

As soon as the train starts (after the header of the report appears, but before any line is displayed) I get a memory error like the following:

2023-04-11 22:09:05.928418: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-11 22:09:06.087963: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-11 22:09:06.981754: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
2023-04-11 22:09:06.981860: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
2023-04-11 22:09:06.981886: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
ℹ Saving to output directory: model
ℹ Using GPU: 0

=========================== Initializing pipeline ===========================
[2023-04-11 22:09:12,553] [INFO] Set up nlp object from config
[2023-04-11 22:09:12,565] [INFO] Pipeline: ['transformer', 'spancat']
[2023-04-11 22:09:12,570] [INFO] Created vocabulary
[2023-04-11 22:09:12,571] [INFO] Finished initializing nlp object
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.layer_norm.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[2023-04-11 22:12:17,053] [INFO] Initialized pipeline components: ['transformer', 'spancat']
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['transformer', 'spancat']
ℹ Initial learn rate: 0.0
E    #       LOSS TRANS...  LOSS SPANCAT  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE 
---  ------  -------------  ------------  ----------  ----------  ----------  ------
⚠ Aborting and saving the final best model. Encountered exception:
OutOfMemoryError('CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 39.41
GiB total capacity; 36.36 GiB already allocated; 2.50 MiB free; 37.94 GiB
reserved in total by PyTorch) If reserved memory is >> allocated memory try
setting max_split_size_mb to avoid fragmentation.  See documentation for Memory
Management and PYTORCH_CUDA_ALLOC_CONF')
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/__main__.py", line 4, in <module>
    setup_cli()
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/cli/_util.py", line 74, in setup_cli
    command(prog_name=COMMAND)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/typer/core.py", line 785, in main
    **extra,
  File "/opt/conda/lib/python3.7/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/cli/train.py", line 45, in train_cli
    train(config_path, output_path, use_gpu=use_gpu, overrides=overrides)
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/cli/train.py", line 75, in train
    train_nlp(nlp, output_path, use_gpu=use_gpu, stdout=sys.stdout, stderr=sys.stderr)
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/training/loop.py", line 124, in train
    raise e
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/training/loop.py", line 107, in train
    for batch, info, is_best_checkpoint in training_step_iterator:
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/training/loop.py", line 232, in train_while_improving
    score, other_scores = evaluate()
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/training/loop.py", line 287, in evaluate
    scores = nlp.evaluate(dev_corpus(nlp))
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/language.py", line 1415, in evaluate
    for eg, doc in zip(examples, docs):
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/language.py", line 1574, in pipe
    for doc in docs:
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/util.py", line 1670, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "spacy/pipeline/trainable_pipe.pyx", line 73, in pipe
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/util.py", line 1617, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "/home/jupyter/.local/lib/python3.7/site-packages/spacy/util.py", line 1670, in _pipe
    yield from proc.pipe(docs, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/pipeline_component.py", line 212, in pipe
    self.set_annotations(subbatch, self.predict(subbatch))
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/pipeline_component.py", line 229, in predict
    activations = self.model.predict(docs)
  File "/home/jupyter/.local/lib/python3.7/site-packages/thinc/model.py", line 315, in predict
    return self._func(self, X, is_train=False)[0]
  File "/opt/conda/lib/python3.7/site-packages/spacy_transformers/layers/transformer_model.py", line 199, in forward
    model_output, bp_tensors = transformer(wordpieces, is_train)
  File "/home/jupyter/.local/lib/python3.7/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/home/jupyter/.local/lib/python3.7/site-packages/thinc/layers/pytorchwrapper.py", line 219, in forward
    Ytorch, torch_backprop = model.shims[0](Xtorch, is_train)
  File "/home/jupyter/.local/lib/python3.7/site-packages/thinc/shims/pytorch.py", line 92, in __call__
    return self.predict(inputs), lambda a: ...
  File "/home/jupyter/.local/lib/python3.7/site-packages/thinc/shims/pytorch.py", line 110, in predict
    outputs = self._model(*inputs.args, **inputs.kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py", line 861, in forward
    return_dict=return_dict,
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py", line 533, in forward
    output_attentions,
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py", line 416, in forward
    past_key_value=self_attn_past_key_value,
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py", line 345, in forward
    output_attentions,
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py", line 234, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 39.41 GiB total capacity; 36.36 GiB already allocated; 2.50 MiB free; 37.94 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I've already tried to change some hyperparameters, trying to fit the data into the RAM memory, but I wasn't able to do so. And it doesn't really much sense, because I'm using essentially the same data to train a NER model (I've just changed the Spacy file creation, of course) and for the NER model I was able to train the whole model with my machine GPU which is a 3070 with 8GB of RAM

Any ideas of what should I do? There is any hyperparamter I can change?

This is my current config:

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","spancat"]
batch_size = 1024
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.spancat]
factory = "spancat"
max_positive = null
scorer = {"@scorers":"spacy.spancat_scorer.v1"}
spans_key = "sc"
threshold = 0.5

[components.spancat.model]
@architectures = "spacy.SpanCategorizer.v1"

[components.spancat.model.reducer]
@layers = "spacy.mean_max_reducer.v1"
hidden_size = 1128

[components.spancat.model.scorer]
@layers = "spacy.LinearLogistic.v1"
nO = null
nI = null

[components.spancat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.spancat.suggester]
@misc = "spacy.ngram_suggester.v1"
sizes = [1,2,3]

[components.transformer]
factory = "transformer"
max_batch_items = 128
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
mixed_precision = false

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 64
stride = 48

[components.transformer.model.grad_scaler_config]

[components.transformer.model.tokenizer_config]
use_fast = true

[components.transformer.model.transformer_config]

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 10
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null
before_update = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 512
buffer = 64
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
spans_sc_f = 1.0
spans_sc_p = 0.0
spans_sc_r = 0.0

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

Answered by svlandeg

Apr 25, 2023

Hey! So do I understand it correctly that this has been resolved in #12551 (reply in thread) ?

View full answer

svlandeg · 2023-04-18T16:55:45Z

svlandeg
Apr 18, 2023

Hi! Sorry to hear you're running into this annoying memory issue. It's interesting that you say that you were able to train an NER model on the same data without issues.

Have you tried running this with setting max_epochs = -1, to stream the training data? Have you tried lowering training.batcher.size and/or nlp.batch_size?

Also, did you run spacy debug data on your data? It should give you an idea on ideal ngram sizes for the suggester function. Currently it's set to

[components.spancat.suggester]
@misc = "spacy.ngram_suggester.v1"
sizes = [1,2,3]

which means it will predict every single 1-gram, 2-gram or 3-gram, which are a lot of candidate entities. If you can prune this somehow, that would help. Just FYI, we also have a few experimental approaches you could look into: https://github.com/explosion/spacy-experimental/tree/master/spacy_experimental/span_suggesters & https://github.com/explosion/spacy-experimental/tree/master/spacy_experimental/span_finder. Have a look at https://explosion.ai/blog/spancat for more details around this.

Let me know what you find! 🤞

3 replies

delucca Apr 18, 2023
Author

hi! thanks for your response 😄

after posting this thread I've explored a little further and found about the max_epochs=-1, but even that wasn't able to help. I've did a bunch of experiments and understood that this started to happen as soon as I've moved to the transformers-based model. On the tok2vec my 8GB GPU was able to train the model, but with the transformer I need way more RAM

I've allocated a 40GB GPU on the cloud and I was able to train with 50% of the data. Also, I gave up training a SpanCat because even with GPU such large model took waaaay to long for training.

Either way, now with a NER model and according to my calculations for 100% of my training data I need a 96GB GPU, I'm trying now to execute a distributed training 😄

About the training data, it is something close to 1MM text records

svlandeg Apr 25, 2023

Hey! So do I understand it correctly that this has been resolved in #12551 (reply in thread) ?

Answer selected by delucca

delucca Apr 25, 2023
Author

Indeed it was :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to improve GPU memory usage of SpanCat #12521

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to improve GPU memory usage of SpanCat #12521

Uh oh!

delucca Apr 11, 2023

Replies: 1 comment · 3 replies

Uh oh!

svlandeg Apr 18, 2023

Uh oh!

Uh oh!

delucca Apr 18, 2023 Author

Uh oh!

svlandeg Apr 25, 2023

Uh oh!

delucca Apr 25, 2023 Author

delucca
Apr 11, 2023

Replies: 1 comment 3 replies

svlandeg
Apr 18, 2023

delucca Apr 18, 2023
Author

delucca Apr 25, 2023
Author