Train NER based on existing model or add custom trained NER to existing model error

in spaCy < 3.0 I was able to train the NER component within the trained en_core_web_sm model:

`python -m spacy train en model training validation --base-model en_core_web_sm --pipeline "ner" -R -n 10`

Specifically, I need the tagger and in the parser of the en_core_web_lg model. These components can be added with the corresponding source and then insert to the frozen_component in the training section of the config file (I will provide my full config at the end of this question):
`[components]

[components.tagger]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]

[components.parser]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]
.
.
.
[training]
frozen_components = ["tagger","parser"]`

When I'm debugging, following error occurs:
`ValueError: [E922] Component 'tagger' has been initialized with an output dimension of 49 - cannot add any more labels.`

When I put tagger to the disabled components in the nlp section of the config file or if I delete everything related to the tagger, debugging and training works. However, when applying the trained model to a text loaded to a doc, only the trained NER works and none of the other components. E.g. the parser predicts everything is ROOT.

I also tried to train the NER model on its own and then add it to the loaded en_core_web_sm model:
`MODEL_PATH = 'data/model/model-best'
nlp = spacy.load(MODEL_PATH)

english_nlp = spacy.load("en_core_web_sm")

ner_labels = nlp.get_pipe("ner")
english_nlp.add_pipe('ner_labels')`

This leads to the following error:
`ValueError: [E002] Can't find factory for 'ner_labels' for language English (en). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer`

**Does anyone have a suggestion how I can either train my NER with the en_core_web_sm model or how I could integrate my trained component?**

- Stack Overflow: https://stackoverflow.com/questions/66090423/spacy-v3-train-ner-based-on-existing-model-or-add-custom-trained-ner-to-existing?noredirect=1#comment117160843_66090423

Here's the entire config file:
`[paths]
train = "training"
dev = "validation"
vectors = null
init_tok2vec = null

[system]
gpu_allocator = null
seed = 0

[nlp]
lang = "en"
pipeline = ["tok2vec","tagger","parser","ner"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.tagger]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]

[components.parser]
source = "en_core_web_sm"
replace_listeners = ["model.tok2vec"]

[components.ner]
factory = "ner"
moves = null
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.Tok2Vec.v2"

[components.tok2vec.model.embed]
@architectures = "spacy.MultiHashEmbed.v1"
width = ${components.tok2vec.model.encode.width}
attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
rows = [5000,2500,2500,2500]
include_static_vectors = false

[components.tok2vec.model.encode]
@architectures = "spacy.MaxoutWindowEncoder.v2"
width = 256
depth = 8
window_size = 1
maxout_pieces = 3

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 2000
gold_preproc = false
limit = 0
augmenter = null

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = ["tagger","parser"]
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
ents_per_type = null
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0

[pretraining]

[initialize]
vectors = "en_core_web_lg"
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]`
## Your Environment

Operating System: Windows 10
Python Version Used: 3.8.3
spaCy Version Used: 3
Environment Information: python .venv


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Train NER based on existing model or add custom trained NER to existing model error #7135

Your Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Train NER based on existing model or add custom trained NER to existing model error #7135

Description

Your Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions