-
Notifications
You must be signed in to change notification settings - Fork 720
Description
I'm trying to understand how simple transformers , specifically BERT NER models, predicts its outputs from the predict function and using the split_on_space argument. It looks like it can handle word level NER using the token classification models with the split on space argument. I just dont understand how it aggregates the model output for each split, where the model output is on a token basis, which can include subwords. It looks like load_and_cache_examples encodes each word with the help of convert_examples_teatures, then the model predicts, then _convert_tokens_to_word_logits converts the tokens back to words, i think? What I would like to do is at a very high level, do a similar thing that simpletrasnformers does, which is predict the correct class on a word by word basis, using a existing token_classification model. The reason I ask here is because the model I was using was trained using simple_transformers. I just don't want all the other stuff it does, the batching, onnx, etc. Like what's the logic to tokenize the each individual split by space, then predict on that particular split?