Skip to content

Regression: search gets unusably slow for large search index #2872

@fingolfin

Description

@fingolfin

This is a regression introduced by PR #2744 (CC @Rahban1) which caused the Plots.jl documentation search to break (well... it does work if you are willing to wait 30-40 seconds for your first search.

The problem is this code in assets/html/js/search.js (and repeated in test/search/wrapper.js):

      // apply patterns in order of specificity
      for (const pattern of patterns) {
        pattern.lastIndex = 0; //reset regex state
        let match;
        while ((match = pattern.exec(remaining)) != null) {
          const token = match[0].trim();
          if (token && !tokens.includes(token)) {
            tokens.push(token);
          }
        }
      }

      // splitting the content if something remains
      const basicTokens = remaining
        .split(/[\s\-,;()[\]{}]+/)
        .filter((t) => t.trim());
      for (const token of basicTokens) {
        if (token && !tokens.includes(token)) {
          tokens.push(token);
        }
      }

      return tokens.filter((token) => token.length > 0);

Note the tokens.includes(token) bit: this performs a linear search. As a result, this code become $O(n^2)$ in the number of tokens, when it could (and should) be linear.

A simple fix should be to replace tokens by a hashmap/hashset/dictionary/set/whatever this is called in JavaScript.
(In the end this may the have to be turned again into an array/vector/list -- I don't know the minisearch API well enough to know. If the order of the tokens is important, another option would be to leave tokens as it is, but add a second tokens_set which uses a hashtable; then the includes test is performed using tokens_set, and new entries are pushed into both tokens and tokens_set)
If the order of th

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions