Matcher in spacy3.0 is slower than that in spacy2.0.18 #8790
Replies: 3 comments 5 replies
-
Thanks for the report. We don't do automated speed testing on the Matcher, so while we have designed it to be efficient and we do basic checks that it's not really slow, I don't think we profile it regularly. So we wouldn't know exactly when a performance change happened without retrospective profiling or reviewing code changes (unless someone remembers a change that might have caused it, which I don't). Can you clarify how many patterns you have and how big your document is? This time is for only one input document, right? One thing I would note is that while the change in times is a large fraction, your absolute time is quite small, so it could be effected by a lot of unrelated things, including even other processes on your machine. For a match against one doc it's hard to make any general conclusions about overall performance. If you see similar trends when doing a large batch on many different documents that would be a more reliable benchmark. |
Beta Was this translation helpful? Give feedback.
-
Using this sample code it looks like the Matcher became slower between 2.0.18 and 2.1.0 (there are no releases in between those).
For this particular test - which is mainly about cases for long single documents with many patterns - it looks like time may have doubled in between these versions. It also got slower later but that was more gradual. I have looked at the intervening commits and it's not clear that any of them did anything that would cause this. It's a little hard to follow though, because in 2.0.18 all the matchers were in one file, and it was split up after that, so the git history is a little unclear. |
Beta Was this translation helpful? Give feedback.
-
Hi, we're still investigating this specific regression, but I wanted to mention that we recently fixed a related performance issue in the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm doing migrations from Spacy 2.0.18 to Spacy 3.0.6. I am aware that there are significant changes for the implementation of Matcher.
During my tests with large number of patterns on a document, the Matcher in Spacy2 takes about 0.16 second to process the document while it takes 0.9 second in Spacy3. The patterns and the test document are the same. I only have simple patterns with 'LOWER' or 'LEMMA' attribute.
(I tested execution time on
matcher(doc)
)Do you have any idea about the reason why it takes so much longer in spacy3 ? Maybe the problem has already occurred in the previous versions.
Beta Was this translation helpful? Give feedback.
All reactions