-
Notifications
You must be signed in to change notification settings - Fork 0
Improve search scores for direct matches #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| q = self.request.GET.get('q') | ||
| if q: | ||
| """ | ||
| Boost results that contain direct matches of each term in the query, | ||
| regardless of whether any terms are enclosed in double quotes. | ||
| Terms in double quotes are exact match terms, so give them a bit of | ||
| an extra boost, and boost them as is - whitespace and all. | ||
| """ | ||
| exact_match_re = re.compile(r'"(?P<phrase>.*?)"') | ||
| tokens = exact_match_re.split(q) | ||
| exacts = exact_match_re.findall(q) | ||
|
|
||
| for t in tokens: | ||
| if t and not t.strip().startswith("-"): | ||
| if t in exacts: | ||
| sqs = sqs.boost(t, 1) | ||
| else: | ||
| sqs = sqs.boost(t, .5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only relevant change so far.
The docs for haystack say that setting a boost above 1.0 increases scores, while less than that decreases it. I have noticed that's not the case here, and any positive amount increases. It's probably because we're using an older version.
Also, this change has made it so that exact match queries (wrapped in double quotes) no longer omit anything that isn't an exact match. Some partial matches now come through. However, exact matches are still at the top of results. I'm a bit green to haystack and solr so if there's anything obvious we can do about that, then I'm down. Otherwise we might just be able to ask them if that's a big deal.
antidipyramid
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work on my end. We can get this up on staging and have them test it out.
Overview
This branch boosts results which have a direct match to the search query. For some reason, haystack/solr has been inconsistent about placing direct matches for license numbers at the top of the search results.
Demo
Note: for the purposes of this demo, I've added the search scores of each result to the license number cell. This change has not been pushed here.
Searching for "556" before boosting (result found on second page):

After boosting (first result):

Notes
I don't have any building ids in my flat drawings so I tested those rankings using map numbers instead.
Testing Instructions