Skip to content

[HTML search] optimization: don't loop over all document terms and title terms during partial-matching. #12045

@jayaddison

Description

@jayaddison

Is your feature request related to a problem? Please describe.
There seems to be a potentially-large inefficiency in the Sphinx JavaScript search code: for any query word greater than length two (likely to be >90% of them, I'd guess!), we iterate through all document terms and all title terms in the client's search index to check for partial matches.

However, if an exact-match was already found on the query word, then we won't add any of those candidates, even when they do match. That means that we spend JavaScript compute resources iterating through items that are unused. Since Sphinx 7.3.0, this is no longer true thanks to #11958 -- if an exact-match on a stemmed term is found, we skip checks for partial matches for that query term.

The relevant code is found here:

Object.keys(terms).forEach((term) => {
if (term.match(escapedWord) && !terms[word])
arr.push({ files: terms[term], score: Scorer.partialTerm });
});
Object.keys(titleTerms).forEach((term) => {
if (term.match(escapedWord) && !titleTerms[word])
arr.push({ files: titleTerms[word], score: Scorer.partialTitle });
});

I don't have any stats on the performance impact of this, but it feels like it may be significant, especially for large documentation projects. Initial results appear to indicate a 9ms vs 48ms difference on a local test machine.

Describe the solution you'd like

  • If we've already found exact-matches for a document term, then do not iterate over all document terms to check for partial matches.
  • If we've already found exact-matches for a document title term, then do not iterate over all document title terms to check for partial matches.
  • Find a non-brute-force algorithm for partial substring matching on terms and titles.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions