How to retrieve bidirectional relation between the tensor representation and a given span #13048
igormorgado
started this conversation in
Help: Best practices
Replies: 1 comment
-
Here is a very nice tutorial that shows how to align The spacy |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I know that transformer model uses byte pair encoding, therefore isn't always possible to have a single tensor (aka vector), representing a single word. Taken that in account I would like to know if is possible to:
a. Given a
Span
(or thedoc indexes
forstart
andend
of span), retrieve the list of tensors related to that elements/tokens; andb. Given a tensor representation, find the
Span
or the indexes related to it on theDoc
So far I have been playing with the TransformerData object generate by the
en_web_core_trf
pipeline. And could not find a clear way for that.Given the following preamble:
I could find the following data structure equalities:
Also that tensors produced from document are stored at
trfdata.tensors[0]
, with dimensions corresponding to(batch_id, tokens, tensor_representation)
.Tried to reconstruct the text from
align.dataXd
andalign.lengths
, without success. One of the attempts were this oneThe text seems to repeat sometimes. Could not understand how the
Ragged
object works. The output looks like this:But I expected this (from
doc.text
)Last but not least, I noticed that
np.array(trfdata.wordpieces.strings).shape
matches withtrfdata.tensors[0].shape
, therefore I think that I can start to find a bidirectional relation from here. Just need to find how to detect the relation withwordpieces
and their relatedSpans
in theDoc
.Just to end, my questions are:
wordpieces.strings
andtensors
with the text in the doc.trfdata.tensors[0]
Ragged
object, its indexes and lengths.Best regards...
Beta Was this translation helpful? Give feedback.
All reactions