doc.Tensor not assigned with a transformer. #11632

Larsdegroot · 2022-10-12T10:09:12Z

Larsdegroot
Oct 12, 2022

I was under the assumption from this article that a transformer would assign doc.tensor. However for my own trained model and the sciSpacy en_core_sci_scibert, this is an empty array. After some digging i found some discussions and issues that say that you need to implement the creation of doc.tensor yourself by accessing the doc._.trf_data.

My confusion is that it there should be a implementation of this in spacy-transformers, because spacy's NER model works with token vectors (spacy tokenization) and not with vectors for transformers wordpieces (transformer tokenization). And when looking at this line of code from the relation extraction tutorial it seems like the transformer listener returns a array with the same column length as tokens in a doc, and not a instance of TransformerData. Which is what i get when calling nlp.get_pipe('transformer').listeners[0].

So is the behaviour of the transformer listener different during training and calls to nlp() which causes the listener to return vectors aligned to spacy's tokenization? Or is there a different way to access spacy aligned transformer vectors during runtime?

Answered by richardpaulhudson

Oct 12, 2022

Transformer data is transformed into token-aligned tensors here. I'm not sure if that fully answers your question, if not please go on asking!

View full answer

richardpaulhudson · 2022-10-12T17:17:22Z

richardpaulhudson
Oct 12, 2022

Transformer data is transformed into token-aligned tensors here. I'm not sure if that fully answers your question, if not please go on asking!

1 reply

Larsdegroot Oct 13, 2022
Author

Yes, perfect! thank you so much.

with this i was able to get an array with the same amount of rows as tokens in the doc.
for other people trying to understand the rel_component and maybe altering it to make it more robust these where my additions to a snippit of the forward function to make it possible to run outside of training / runtime. I'm using this so that i can play with the code.

import spacy
from spacy_transformers.layers import trfs2arrays
from thinc.layers import reduce_mean
from spacy.tokens import Doc, Span, Token
from typing import List, Tuple, Callable

nlp = spacy.load(model_path) # a transformer model with ner

docs = [doc]
is_train = False
gradfactor = 1.0

ops = get_current_ops()

pooling = reduce_mean()
trfs_2_arrays_layer = trfs2arrays(pooling, gradfactor)

all_instances = [get_instances(doc) for doc in docs]
tok2vec = nlp.get_pipe('transformer_ner').listeners[0] # should be the same as what is passed to create_tensors
tokvecs, bp_tokvecs = tok2vec(docs, is_train)
tokvecs = trfs_2_arrays_layer(tokvecs, is_train)[0]

ents = []
lengths = []

for doc_nr, (instances, tokvec) in enumerate(zip(all_instances, tokvecs)):
    token_indices = []
    for instance in instances:
        
        for ent in instance:
            token_indices.extend([i for i in range(ent.start, ent.end)])
            lengths.append(ent.end - ent.start)
            
    ents.append(tokvec[token_indices])
    
lengths = cast(Ints1d, ops.asarray(lengths, dtype="int32"))
entities = Ragged(ops.flatten(ents), lengths)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

doc.Tensor not assigned with a transformer. #11632

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

doc.Tensor not assigned with a transformer. #11632

Uh oh!

Larsdegroot Oct 12, 2022

Replies: 1 comment · 1 reply

Uh oh!

richardpaulhudson Oct 12, 2022

Uh oh!

Uh oh!

Larsdegroot Oct 13, 2022 Author

Larsdegroot
Oct 12, 2022

Replies: 1 comment 1 reply

richardpaulhudson
Oct 12, 2022

Larsdegroot Oct 13, 2022
Author