Help debugging a custom suggester that can return 0 suggestions #11818
Replies: 1 comment 7 replies
-
I've been looking more into this over the last few days and here are some notes on what I've found here. I think the concept of suggesters that potentially emit 0 spans is plausibly supported (the existence of a test for them implies that), but in practice I the default Model for the SpanCategorizer component doesn't seem to support it. I found a few references (below) suggesting that for various types of operations, you can't expect the model to compute a result for zero-length sequences.
And suggesters that potentially emit no spans can output zero-length sequences, then in practice empty suggesters aren't really supported. Two obvious solutions come to mind,
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
👋🏻 I could use some help with a custom suggester function. Essentially, I'm using a spaCy project to attempt to build out a SpanCategorizer model with a custom suggester. The main idea is that I'd like to have control over the lexicon for candidates. I'll label the data (in Prodigy) with potential candidates to produce training data on spans. Then, the project builds out a SpanCategorizer model so that the model only decides on those candidates. In this case, I was experimenting with a suggester that takes match patterns as input and emits the Ragged array of spans in the docs. If it's relevant, I've tried with a
en_core_web_md
model here.Code for the suggester (click to toggle)
Where I'm running into trouble is that on spaCy 3.3, things work exactly as I expect. However, on spaCy 3.4, they don't work. I get the following error, namely that
ValueError: all sequence lengths must be >= 0
. Indeed, if I drop into a debugger and inspect the input my SpanCat's model, it looks likeRagged(data=array([], shape=(0, 0), dtype=int32), lengths=array([0], dtype=int32), data_shape=(-1, 0), starts_ends=None)
.Here's the error I get (click to toggle)
What seems like is happening here is that my suggester potentially suggests no spans sometimes. And I'm not quite sure why this works on 3.3 but not after upgrading to 3.4. I've looked over both of the ngram suggesters from either versions and aside from some typing updates, I don't see anything obvious I should be doing when no spans exist.
I've also looked over https://spacy.io/usage/v3-4 and I don't see anything noteworthy about changes in SpanCategorizer that I should be doing. (I updated my config to the latest version, though, but that didn't change anything.)
What am I missing about the way I'm building a custom suggester that I need to be doing here? Or is there a better method I should be looking into here?
Beta Was this translation helpful? Give feedback.
All reactions