Token.lex.prob returns constant values on nightly. #6388

koaning · 2020-11-13T21:24:58Z

koaning
Nov 13, 2020

How to reproduce the behaviour

I think something is up with Token.lex.prob on nightly.

When I run this code;

import spacy 
nlp = spacy.load("en_core_web_lg")
for t in nlp("bank of the riverside and stuff"):
  print(t.text, t.lex.prob)

I get this output.

bank -20.0
of -20.0
the -20.0
riverside -20.0
and -20.0
stuff -20.0

It seems like all the words have a constant prob value. This is for both en_core_web_md and en_core_web_lg.

Your Environment

I'm running on google colab.

Info about spaCy

spaCy version: 3.0.0rc2
Platform: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
Pipelines: en_core_web_md (3.0.0a0), en_core_web_lg (3.0.0a0), en_core_web_trf (3.0.0a0)

Answered by adrianeboyd

Nov 16, 2020

Hi, this is a change in v2.3+, where the probability features aren't included with the pretrained models by default. See the section on "Probability and cluster features" for how to load very similar probability tables into a model in v2.3+: https://spacy.io/usage/v2-3#migrating

Be aware that the tables from spacy-lookups-data are not identical with the v2.2 models because there are 1M entries instead of 1.3M. The size of the tables in the spacy-lookups-data package starts to become a problem, so I just kept the most frequent 1M tokens.

If you need the exact same probabilities, you can export the probabilities from a v2.2 model and load them in if you need the exact same probabilities. Yo…

View full answer

adrianeboyd · 2020-11-16T07:56:34Z

adrianeboyd
Nov 16, 2020

Hi, this is a change in v2.3+, where the probability features aren't included with the pretrained models by default. See the section on "Probability and cluster features" for how to load very similar probability tables into a model in v2.3+: https://spacy.io/usage/v2-3#migrating

Be aware that the tables from spacy-lookups-data are not identical with the v2.2 models because there are 1M entries instead of 1.3M. The size of the tables in the spacy-lookups-data package starts to become a problem, so I just kept the most frequent 1M tokens.

If you need the exact same probabilities, you can export the probabilities from a v2.2 model and load them in if you need the exact same probabilities. You can set lexeme.prob manually for a lexeme to whatever prob value you'd like and the value will be saved when you save the model with nlp.to_disk.

0 replies

koaning · 2020-11-16T10:02:19Z

koaning
Nov 16, 2020
Author

Ah I wasn't aware of that. Thanks for the detailed explanation 👍 !

0 replies

koaning · 2020-12-24T17:19:18Z

koaning
Dec 24, 2020
Author

Strange. I had gotten it to work for spaCy 2.3 but now it seems like nlp.vocab.lookups_extra no longer exists in spaCy nightly? I'm following the guide here.

import spacy 

nlp = spacy.load("en_core_web_md")

if nlp.vocab.lookups_extra.has_table("lexeme_prob"): 
    nlp.vocab.lookups_extra.remove_table("lexeme_prob")

This code results in;

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-e5b7e1a65b2b> in <module>
----> 1 if nlp.vocab.lookups_extra.has_table("lexeme_prob"):
      2     nlp.vocab.lookups_extra.remove_table("lexeme_prob")
      3 

AttributeError: 'spacy.vocab.Vocab' object has no attribute 'lookups_extra'

Is the migration perhaps slightly different for spaCy 3.0?

0 replies

adrianeboyd · 2020-12-28T07:28:49Z

adrianeboyd
Dec 28, 2020

The lookups_extra tables have been moved back into lookups in v3. With the v3 configs, there's no need to separate them so that all the huge tables aren't loaded from spacy-lookups-data by default for blank models, which was the problem in v2.3. Now you can put a list of tables you want in the config instead.

It should work as for v2.3 if you just refer to lookups instead of lookups_extra.

0 replies

jklaise · 2021-02-02T11:54:40Z

jklaise
Feb 2, 2021

@adrianeboyd since spacy v3 just dropped, I've been having issues getting the extra tables via lookups. E.g. on loading en_core_web_md via nlp=spacy.load('en_core_web_md') the only table available under nlp.vocab.lookups is lexeme_norm and I'm not sure how to load the other tables, e.g. lexeme_prob. I see you mentioned providing a config to load the required tables, but I'm not sure how to use this with loading these pretrained models? Any pointers would be appreciated!

EDIT:
Attempting to define a config to load extra tables:

nlp=spacy.load('en_core_web_md', config={"initialize.lookups.tables":["lexeme_prob", "lexeme_norm"]})

runs without errors, however extra tables don't seem to be loaded:

>>>nlp.vocab.lookups.tables
['lexeme_norm']

3 replies

adrianeboyd Feb 2, 2021

Adding this to the config is only going to work if you're training a model from scratch, where nlp.initialize() is called before training starts to load the tables specified in the [initialize] block. (If you call nlp.initialize() on a pretrained model, it's going to re-initialize all the components (like the tagger), which is not what you want.)

Let's see, I think the easiest way is to just load and add the table separately:

import spacy
from spacy.lookups import load_lookups
nlp = spacy.load("en_core_web_sm")
lookups = load_lookups("en", ["lexeme_prob"])
nlp.vocab.lookups.add_table("lexeme_prob", lookups.get_table("lexeme_prob"))

jmyerston Feb 15, 2021

I just posted a question related to this a in different issue, but your message here seems to address my problem. Let me see if I understand: lexeme tables are not loaded by default any longer contrary to the lemma data. So, now one needs to add the lexeme table to the training config file, something similar to the augmentation of data.

The problem I'm having is that I do not know what exactly to put in

[initialize]
lookups =

Whatever I have written there gives me a configuration error:

✘ Config validation error
initialize -> lookups instance of Lookups expected

What I would like to load is grc_lexeme_norm.json

Thanks.

adrianeboyd Feb 16, 2021

Here's what the config would look like for tables from spacy-lookups-data:

[initialize.lookups]
@misc = "spacy.LookupsDataLoader.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]

(If it helps, I just copied this out of the en_core_web_sm config. Since we're using most of the common config/init options in the pretrained pipelines, it can help to look at their configs if you get stuck and the docs don't include an example. We should probably add a better example to the docs for this, in any case.)

BrenBarn · 2021-05-08T06:16:19Z

BrenBarn
May 8, 2021

This really needs to be foregrounded in documentation.

0 replies

BrenBarn · 2022-11-14T05:24:20Z

BrenBarn
Nov 14, 2022

A year and a half later, the migration documentation still shows old code that doesn't work. Can it be fixed?

4 replies

adrianeboyd Nov 14, 2022

Could you provide more details about which code is not working for you and how you think it needs to be updated?

PRs for the docs are always welcome if you'd like to submit a fix that way. The source for this page is here:

https://github.com/explosion/spaCy/blob/master/website/docs/usage/v2-3.md

BrenBarn Nov 14, 2022

On the page I linked, the section "Probability and cluster features" still shows the same code that was reported as non-working two years ago. It should have this instead. Also, I think it would be good to have this somewhere in the "real" documentation, not just the migration part.

adrianeboyd Nov 15, 2022

The code on that page is specifically only for migrating from spacy v2.2 to v2.3.

I would agree that code for loading a probability table from spacy-lookups-data into an existing model could be added to the current v3.x docs.

yunbinmo Mar 27, 2025

it's 2025 and I am still referring to this issue to solve the same problem

Uh oh!

Token.lex.prob returns constant values on nightly. #6388

Uh oh!

How to reproduce the behaviour

Your Environment

Info about spaCy

Replies: 7 comments · 7 replies

Uh oh!

Uh oh!

koaning Nov 16, 2020 Author

Uh oh!

Uh oh!

koaning Dec 24, 2020 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 7 comments 7 replies

koaning
Nov 16, 2020
Author

koaning
Dec 24, 2020
Author