Improve load performance of SpanRuler with lots of phrase patterns? #11988

kinghuang · 2022-12-18T18:06:55Z

kinghuang
Dec 18, 2022

I've created a language model with only a SpanRuler component (nothing else on the pipeline). The ruler is initialized with a few million phrase patterns. The runtime performance is totally fine. But, it takes a really long time to load the model. When the model is assembled, the phrases are serialized as their original text and I understand the text gets tokenized and processed by the underlying PhraseMatcher on load.

Is there any way to improve the load performance? Is there some serialization methodology where the SpanRuler+PhraseMatcher can be saved ready-to-go, instead of doing all the parsing on load?

Answered by polm

Dec 19, 2022

There is not a special mode for this or anything. What we have suggested in the past is pickling (#4445, #10514). That should avoid the overhead of recreating the the Docs, and should be similar to reading from a DocBin and using add.

There's no way to directly set the final internal data structures because they aren't exposed on the Python object, but we could think about adding a way to do that.

View full answer

polm · 2022-12-19T04:23:06Z

polm
Dec 19, 2022

There is not a special mode for this or anything. What we have suggested in the past is pickling (#4445, #10514). That should avoid the overhead of recreating the the Docs, and should be similar to reading from a DocBin and using add.

There's no way to directly set the final internal data structures because they aren't exposed on the Python object, but we could think about adding a way to do that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve load performance of SpanRuler with lots of phrase patterns? #11988

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Improve load performance of SpanRuler with lots of phrase patterns? #11988

Uh oh!

Uh oh!

kinghuang Dec 18, 2022

Replies: 1 comment

Uh oh!

polm Dec 19, 2022

kinghuang
Dec 18, 2022

polm
Dec 19, 2022