Skip to content

Conversation

@Conorrr
Copy link

@Conorrr Conorrr commented Jan 10, 2026

Problems

Search was strict substring match on entire query string - word order and exact spacing mattered, any typo broke matches. Examples: "pdf merge" didn't match "Merge PDF", "base 64" didn't match "Base64 Encoder/Decoder".

Keywords, description and short description were not being included in the search (although it seems that the intention was they should be)

Changes

• Token-based matching: Query is normalised (trimmed, lowercased, split on whitespace), then each token must appear as substring in tool's searchable text (name, description, shortDescription, keywords). Order no longer matters.
• Fuzzy matching with Levenshtein distance: Tokens within edit distance 1 are considered weak matches, so small typos still find tools.
• Relevance scoring: Tools ranked by weighted score (name matches weighted higher than description/keywords), exact matches weighted higher than fuzzy.
• Token variants: Handles patterns like "base 64" → "base64" by generating merged token variants.

Examples that now work

Query Before After
"pdf merge" ❌ No match ✅ Merge PDF
"base 64" ❌ No match ✅ Base64 tools
"merhe pdf" (typo) ❌ No match ✅ Merge PDF
"pdf " (extra spaces) ❌ Small selection of PDF tools ✅ All PDF tools
" pdf merge " (extra spaces) ❌ No match ✅ Merge PDF
"dimension" ❌ No match ✅ Resize Image

Keyword cleanup

While testing, cleaned up keywords in various tool meta files - removed duplicates, fixed inconsistencies, added missing synonyms (e.g. "b64" for base64, "join" for merge-pdf).

Tests

Added src/tools/index.test.ts with comprehensive tests covering token matching, case insensitivity, keyword synonyms, typo tolerance, and ranking behaviour.

Manually tested performance and the search isn't noticeably slower.

Comments

Keywords are still hardcoded as english but after this change it would be trivial to extract keywords to a single i18n string of search terms.

Conorrr and others added 4 commits January 10, 2026 13:23
* Implement token-based, whitespace-robust search in filterTools:
* Normalize queries by trimming, lowercasing, and collapsing internal whitespace.
* Split normalized query into tokens and require every token to match at least one of: localized name, description, shortDescription, or keywords.
* Keep user-type filtering behavior the same and apply it before search filtering.
* Return all tools (post user-type filter) when the normalized query is empty or only whitespace.
* Make Hero autocomplete respect the new search behavior:*  Use filterOptions={(options) => options} to disable MUI Autocomplete’s internal filtering.
* Rely solely on filterTools + filteredTools so queries like "pdf  " behave the same as "pdf" and support multi-word, order-insensitive searches.
* Add unit tests for Phase 1 search behavior:
* New src/tools/index.test.ts covering:
  * Empty and whitespace-only queries returning all tools.
  * Word-order and extra-whitespace robustness for queries like "pdf merge" vs "   merge   pdf  ".
  * Case-insensitive matching and multi-word queries for the Base64 tool.
  * Trailing spaces being ignored.
  * Non-English behavior by using localized Spanish strings (e.g. "unir pdf") to confirm locale-agnostic token matching.

Co-Authored-By: Warp <[email protected]>
•  Extend filterTools query normalization with alpha+digit token concatenation (e.g. base + 64 -> base64)
•  Document keyword-based synonym usage in filterTools
•  Expand filterTools tests to cover alias-style queries like pdf join and base 64

Co-Authored-By: Warp <[email protected]>
•  Add weighted scoring and Levenshtein-based fuzzy matching in filterTools
•  Ensure title matches rank above description-only matches and keep existing token behavior
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant