Get all the tokens' vector without loop #12667

iwkkk · 2023-05-24T11:34:03Z

iwkkk
May 24, 2023

I want to ask a question. Now I have a very big document. I need to get every word's vector. But I can only do it through this: vectors = [token. vector for sent in nlp(doc) for token in sent], it needs a lot of time. So I want to ask if there is a faster way to achieve this.
I used the small model in GPU and I've already disabled all the components that I don't need.

Answered by vinbo8

May 30, 2023

You should not be doing this - the vectors that this pipeline generates are trained to return representations that are a) highly conditioned on their context, and b) specific to some downstream task; it's not going to be very meaningful to run similarity tests on them.

View full answer

vinbo8 · 2023-05-24T13:50:29Z

vinbo8
May 24, 2023

We'd be able to help better if you gave us some more information about what exactly you were trying to do with this. The contexualised tok2vec vectors your one-liner returns are not much use beyond what they've been trained for.

3 replies

iwkkk May 25, 2023
Author

Hi! For example, I have 2 sentences A and B. I need to get every word vector of them. I will use these vectors to compute each pair of words' similarity among these 2 sentences thus finding the salient words of A to B. My problem now is that it takes too much time to get these vectors.

vinbo8 May 30, 2023

You should not be doing this - the vectors that this pipeline generates are trained to return representations that are a) highly conditioned on their context, and b) specific to some downstream task; it's not going to be very meaningful to run similarity tests on them.

Answer selected by vinbo8

iwkkk May 30, 2023
Author

ok, I understand. Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Get all the tokens' vector without loop #12667

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Get all the tokens' vector without loop #12667

Uh oh!

iwkkk May 24, 2023

Replies: 1 comment · 3 replies

Uh oh!

vinbo8 May 24, 2023

Uh oh!

iwkkk May 25, 2023 Author

Uh oh!

vinbo8 May 30, 2023

Uh oh!

iwkkk May 30, 2023 Author

iwkkk
May 24, 2023

Replies: 1 comment 3 replies

vinbo8
May 24, 2023

iwkkk May 25, 2023
Author

iwkkk May 30, 2023
Author