Benefits of Static vectors #12444

python3Berg · 2023-03-18T16:32:09Z

python3Berg
Mar 18, 2023

I have been experimenting with different configurations to migrate v2 models to v3. With all of the new options available through config file, there's been quite a bit of experimentation.

One question I have is with respect to static vectors. With or without we are using a tok2vec with multihash embedding. How do the static vectors benefit? I've gotten a sense from documentation that tok2vec with static vectors can be useful for transfer learning...If I've decided that transfer learning is impractical given my data and use-case, is there still any benefit to using? I know my models are taking significantly more memory to deploy...am I in turn benefiting from improved accuracy.

My config file is copies below. Thanks.

`[paths]
train = null
dev = null
vectors = "en_core_web_lg"
init_tok2vec = null

[system]
gpu_allocator = null
seed = 0

[nlp]
lang = "en"
pipeline = ["tok2vec","ner"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@Tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
scorer = {"@scorers":"spacy.ner_scorer.v1"}
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"

[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.tok2vec.model.encode.width}
attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
rows = [5000,1000,2500,2500]
include_static_vectors = true

[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = 256
depth = 8
window_size = 1
maxout_pieces = 3

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.5
accumulate_gradient = 1
patience = 5000
max_epochs = 0
max_steps = 200000
eval_frequency = 100
frozen_components = []
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_sequence.v1"
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 2
stop = 10
compound = 1.05
t = 0.0

[training.logger]
@Loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
cats_score = 0.0
cats_score_desc = null
cats_micro_p = null
cats_micro_r = null
cats_micro_f = null
cats_macro_p = null
cats_macro_r = null
cats_macro_f = null
cats_macro_auc = null
cats_f_per_type = null
cats_macro_auc_per_type = null
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
after_init = null

[initialize.before_init]
@callbacks = "custom_tokenizer"

[initialize.components]

[initialize.tokenizer]`

Answered by kadarakos

Mar 20, 2023

Hey python3Berg,

Its a largely empirical question whether you will see benefit from the static-vectors or not. It might be the case that your domain is really specific and transfer learning does not improve too much. It might be the case that the text that you have has a lot of unusual tokens for which the pre-trained vectors might be uninformative. For named entity recognition we published a technical report where on the data sets we experimented with the static vectors were always helpful, but especially when recognizing unseen entities i.e.: entities not present in the training set: https://arxiv.org/abs/2212.09255

View full answer

kadarakos · 2023-03-20T10:44:56Z

kadarakos
Mar 20, 2023

Hey python3Berg,

Its a largely empirical question whether you will see benefit from the static-vectors or not. It might be the case that your domain is really specific and transfer learning does not improve too much. It might be the case that the text that you have has a lot of unusual tokens for which the pre-trained vectors might be uninformative. For named entity recognition we published a technical report where on the data sets we experimented with the static vectors were always helpful, but especially when recognizing unseen entities i.e.: entities not present in the training set: https://arxiv.org/abs/2212.09255

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Benefits of Static vectors #12444

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Benefits of Static vectors #12444

Uh oh!

python3Berg Mar 18, 2023

Replies: 1 comment

Uh oh!

kadarakos Mar 20, 2023

python3Berg
Mar 18, 2023

kadarakos
Mar 20, 2023