Skip to content
Discussion options

You must be logged in to vote

Hi, the difference is whether you're including the special tokens or not. If you treat the doc the same way as the span, you get the same results:

doc_vect = doc._.trf_data.tensors[-1].mean(axis=0)
tensor_ix = doc._.trf_data.align[0: len(doc)].data.flatten()
out_dim = doc._.trf_data.tensors[0].shape[-1]
tensor = doc._.trf_data.tensors[0].reshape(-1, out_dim)[tensor_ix]
doc_vect = tensor.mean(axis=0)

There are five tokens on the transformer side (['<s>', 'V', 'ESS', 'EL', '</s>']) and the alignment to "VESSEL" in trf_data.align does not include the <s> and </s> tokens.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@oliviercwa
Comment options

Answer selected by oliviercwa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / transformer Feature: Transformer
2 participants
Converted from issue

This discussion was converted from issue #9517 on October 25, 2021 08:27.