Skip to content

Substring search #122

@MatasGds

Description

@MatasGds

Hi,

Thank you for this excellent library. So far, I have successfully implemented symspell in categorization algorithm. It works well and fast. I am looking for suggestions on how to improve my current algorithm for substring search:

I am using a list of keywords as a dictionary. The words that are misspelled or truncated are changed to the keywords, which determine the category of a string. For example 'salar for April', 'Life Insuranse' are changed to 'salary for April' and 'Life Insurance', respectfully, since 'salary' and 'insurance' are in the keywords list. However, some of the strings are not only misspelled, but also missing spaces or there are too many mistakes. So, 'salaryfor April', 'LifeInsurance' and 'salaryyyy' are not recognized and, therefore, cannot be categorized by the current solution. Using the whole vocabulary as a dictionary is not feasible. Instead, I want to find a way to implement substring search, which would help me to find strings that contain certain substrings such as 'salar', 'insuran', 'accommod' and so on.

Can symspell be utilized for substring search? Or maybe you have other suggestions on how to effectively implement this idea and combine it with symspell?

Thank you in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions