How does simple transformers handle split by space predictions for NER models?

I'm trying to understand how simple transformers , specifically BERT NER models, predicts its outputs from the `predict` function and using the split_on_space argument. It looks like it can handle word level NER using  the token classification models with the split on space argument. I just dont understand how it aggregates the model output for each split, where the model output is on a token basis, which can include subwords. It looks like `load_and_cache_examples` encodes each word with the help of `convert_examples_teatures`, then the model predicts, then `_convert_tokens_to_word_logits` converts the tokens back to words, i think? What I would like to do is at a very high level, do a similar thing that simpletrasnformers does,  which is predict the correct class on a word by word basis, using a existing token_classification model. The reason I ask here is because the model I was using was trained using simple_transformers. I just don't want all the other stuff it does, the batching, onnx, etc. Like what's the logic to tokenize the each individual split by space, then predict on that particular split? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does simple transformers handle split by space predictions for NER models? #1583

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How does simple transformers handle split by space predictions for NER models? #1583

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions