Training on GPU #7975

Phat-Loc · 2021-05-01T05:02:04Z

Phat-Loc
May 1, 2021

What settings should I use in the training section of config the config to max out my gpu usage? Most of the time it is my gpu runs between 0 - 10% during training. Here is my settings

[system]
gpu_allocator = "pytorch"
#gpu_allocator = null
seed = 0

[nlp]
lang = "en"
pipeline = ["tok2vec", "senter"]
tokenizer = {"@Tokenizers":"spacy.Tokenizer.v1"}
before_creation = null
after_creation = null
after_pipeline_creation = null
disabled = []
batch_size = 1000

[training]
train_corpus = "corpora.train"
dev_corpus = "corpora.dev"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

[training.logger]
@Loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
ents_per_type = null
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0

polm · 2021-05-01T08:38:44Z

polm
May 1, 2021

Assuming your training data is large enough to fill your GPU memory, I would try increasing batch_size.

That said you don't seem to have a Transformer in your pipeline. You can train on GPU without a Transformer, but is that what you intend to do? If you have a good GPU usually using Transformers is worth it.

3 replies

Phat-Loc May 1, 2021
Author

Thanks! I am trying out the transformer model. I am mainly focused on making a custom senter component. Should I freeze the transformer component if I want to use parser later? Doesn't the other components depend on the transformer layer?

[paths]
train = null
dev = null

[system]
gpu_allocator = "pytorch"

[nlp]
lang = "en"
#pipeline = ["transformer","parser","textcat","ner"]
pipeline = ["transformer", "senter"]
batch_size = 128

[components]

[components.transformer]
source = "en_core_web_trf"

[components.senter]
source = "en_core_web_md"

polm May 3, 2021

Hm, this is a bit tricky. If you train the Transformer model on the senter without training the other components it will degrade their performance. You could include a Transfomer just for the senter but that would take up a lot of space. It might be a better idea to include a custom tok2vec layer just for the senter, like explained here.

Also, about this:

[components.senter]
source = "en_core_web_md"

Neither the md model nor any of the pretrained models have a senter, so this isn't going to work. Even if it did have a senter, it would have been downstream from tok2vec so it wouldn't work if you switch the input to transformers.

adrianeboyd May 3, 2021

Let me jump in here: the pretrained non-trf models do all contain a senter component, it's just disabled by default so the parser has priority. You can still use source to fine-tune it even though it's disabled in the original pipeline. Obviously if you're training it, it can't be disabled in the pipeline you're reading it into.

I'm not sure it really makes sense to use a transformer for this task. The goal behind senter was to be smaller and faster (and also easier to fine-tune) than the parser, where it does make more sense to use a transformer if you can for accuracy. I haven't run any benchmarks for senter components with transformers, though, so I'm not sure what kind of performance difference it might make in a case where the accuracy is really important for your task.

In your custom tok2vec for the senter (the config looks just the same as in en_core_web_sm), you should try adding SPACY as one of the attributes, which can help if you have tokens like ASCII quotes where the trailing space can help the model figure out whether it goes with the preceding or following sentence. (The internal symbol for whether there is a trailing space is a boolean named SPACY.) It would look like this:

attrs = ["NORM","PREFIX","SUFFIX","SHAPE","SPACY"]
rows = [1000,500,500,500,100]

Because of the way MultiHashEmbed works, you need more than 2 rows to represent the boolean value. See more details here: #6926 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training on GPU #7975

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Training on GPU #7975

Uh oh!

Phat-Loc May 1, 2021

Replies: 1 comment · 3 replies

Uh oh!

polm May 1, 2021

Uh oh!

Uh oh!

Phat-Loc May 1, 2021 Author

Uh oh!

polm May 3, 2021

Uh oh!

adrianeboyd May 3, 2021

Phat-Loc
May 1, 2021

Replies: 1 comment 3 replies

polm
May 1, 2021

Phat-Loc May 1, 2021
Author