Skip to content

Track‐query matching

Jared Dantis edited this page Sep 10, 2023 · 3 revisions

When you search for a song on a music service, you expect to see the song you're looking for as the first result. But what if you misspell the song title? Or what if there are many songs with the same title from different artists?

It's not realistic to expect a user to specify the exact song and artist every time they search, and music services understandably spend a lot of time and effort on making sure that users can find the song they're looking for. Blanco should be no different.

Old algorithm

Before Release 0.4.5, Blanco used the following logic to check whether a human search query (say, mona lisa) closely matched a song title and artist (say, Mona Lisa Dominic Fike):

from difflib import get_close_matches


def check_similarity(actual: str, candidate: str) -> float:
    """
    Checks the similarity between two strings. Meant for comparing
    song titles and artists with search results.

    :param actual: The actual string.
    :param candidate: The candidate string, i.e. from a search result.
    :return: A float between 0 and 1, where 1 is a perfect match.
    """
    actual_words = set(actual.lower().split(' '))
    candidate_words = set(candidate.lower().split(' '))
    intersection = actual_words.intersection(candidate_words)
    difference = actual_words.difference(candidate_words)

    # Include close matches
    for word in difference:
        close_matches = get_close_matches(word, candidate_words, cutoff=0.8)
        if len(close_matches) > 0:
            intersection.add(close_matches[0])

    return len(intersection) / len(actual_words)

The actual variable contains the raw search query from the user, and the candidate variable is expected to be a string of the format <song title> <artist>.

The gist of the algorithm is this:

  1. Turn the strings lowercase.
  2. Split the actual string (the search query from the user) and candidate string (a search result from Spotify) into sets of words.
  3. Find the intersection of the two sets, i.e., the words that are in both sets.
  4. Find the difference of the two sets, i.e., the words that are in the actual set but not the candidate set.
  5. If there are any words in the difference set that are close matches to words in the candidate set, add them to the intersection set.

This worked pretty well for the most part, but it had two glaring problems:

  1. It didn't take into account the words that were in the candidate set but not the actual set.
  2. It didn't take into account the ranking from Spotify, which more than likely accounts for popularity and trends.

For example, if the user searched for nightstand and the top search results were NIGHTSTAND by Kxllswxtch and Nightstand by Justus Bennetts, the algorithm would return a similarity score of 1.0 for both, even though the two songs are completely different. In fact, it will return a similarity score of 1.0 for any song with the word nightstand in the title, regardless of the artist.

New algorithm

Blanco releases 0.4.5 and newer build upon check_similarity() above, using smarter logic to match human queries against search results. The new algorithm is as follows:

from thefuzz import fuzz


def check_similarity_weighted(actual: str, candidate: str, candidate_rank: int) -> int:
    """
    Checks the similarity between two strings using a weighted average
    of a given similarity score and the results of multiple fuzzy string
    matching algorithms. Meant for refining search results that are
    already ranked.

    :param actual: The actual string.
    :param candidate: The candidate string, i.e. from a search result.
    :param candidate_rank: The rank of the candidate, from 0 to 100.
    :return: An integer from 0 to 100, where 100 is the closest match.
    """
    naive = check_similarity(actual, candidate) * 100
    tsr = fuzz.token_set_ratio(actual, candidate)
    tsor = fuzz.token_sort_ratio(actual, candidate)
    ptsr = fuzz.partial_token_sort_ratio(actual, candidate)

    return int(
        (naive * 0.7) +
        (tsr * 0.12) +
        (candidate_rank * 0.08) +
        (tsor * 0.06) +
        (ptsr * 0.04)
    )

Like the old algorithm, the actual variable contains the raw search query from the user, and the candidate variable is expected to be a string of the format <song title> <artist>.

The algorithm still takes into account how much of the user's query is present in the search result, but it now also takes into account the ranking from Spotify and the results of a few fuzzy string matching algorithms (namely, Token Set Ratio, Token Sort Ratio, and Partial Token Sort Ratio). All of these factors are combined into a weighted average.

Testing

In the test results below, the original index column refers to the index of the song in Spotify's search API results, and the candidate (title + artist) column refers to the candidate search result from Spotify. The weighted column refers to the similarity score returned by check_similarity_weighted(), and the difflib column refers to the similarity score returned by check_similarity().

Spotify rank is just 100 - (original index * 10) (so that the first result has a rank of 100). The partial token sort ratio, token set ratio, and token sort ratio columns refer to the results of the fuzzy string matching algorithms from thefuzz.

The first table in each section is meant to simulate a vague search query, and the second table is meant to simulate a more specific search query. The rows are then sorted by the weighted values. In production, Blanco picks the result with the top weighted value.

As you will see, the ranking is still not perfect, but it's a lot better than if we were to rank using difflib alone.

Test results: Cherry Wine

Search results for "cherry wine"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 Cherry Wine grentperez 97 100 100 78 100 67
1 Cherry Wine - Live Hozier 96 100 90 78 100 65
2 Cherry Wine - Live Hozier 95 100 80 78 100 65
3 Cherry Wine Nas Amy Winehouse 93 100 70 73 100 55
5 Cherry Wine Jasmine Thompson 92 100 50 78 100 56
4 Cherry Wine - Live from Spotify SXSW ... 91 100 60 64 100 39
6 cherry wine Zachary Knowles 91 100 40 80 100 58
9 Cherry Wine Overcoats 90 100 10 78 100 69
7 Cherry Waves Deftones 52 50 30 80 71 56
8 Like Real People Do Hozier 9 0 20 55 32 32

Search results for "cherry wine hozier"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 Cherry Wine - Live Hozier 98 100 100 89 100 88
1 Cherry Wine - Live Hozier 98 100 90 89 100 88
4 Cherry Wine - Live Hozier 95 100 60 89 100 88
2 Cherry Wine - Live from Spotify SXSW ... 94 100 80 72 100 56
8 Cherry Wine - Live Hozier 92 100 20 89 100 88
5 Cherry Wine - Live in Greystones, Cou... 91 100 50 67 100 44
3 Cherry Wine grentperez 68 66 70 64 76 70
6 Cherry Wine Jasmine Thompson 65 66 40 62 76 61
9 Cherry Wine (Arr. for Guitar) Andrew ... 65 66 10 72 100 49
7 Francesca Hozier 38 33 30 62 59 53

Test results: Nightstand

Search results for "nightstand"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 NIGHTSTAND Kxllswxtch 97 100 100 100 100 65
1 Nightstand Justus Bennetts 96 100 90 100 100 56
2 Nightstand Dev Lemons 96 100 80 100 100 65
4 Nightstand Lil Candy Paint 94 100 60 100 100 56
5 Nightstand dethcaps 94 100 50 100 100 69
6 Nightstand YNG Martyr YNG ONE 92 100 40 100 100 51
7 Nightstand Justus Bennetts 91 100 30 100 100 56
8 Nightstand Dev Lemons 91 100 20 100 100 65
9 Nightstand K. Michelle 90 100 10 100 100 65
3 One Night Standards Ashley McBryde 16 0 70 60 45 45

Search results for "nightstand kxllswxtch"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 NIGHTSTAND Kxllswxtch 100 100 100 100 100 100
1 NSYNC Kxllswxtch 59 50 90 86 77 76
2 Nightstand Justus Bennetts 55 50 80 73 65 55
7 MUGSHOT Kxllswxtch 53 50 30 82 72 72
9 KXLLSWXTCH X PYRXCITER - DON'T STOP! ... 49 50 10 62 65 50
3 Sweet 'n' Savage Kxlly 15 0 70 50 44 44
4 Amnesia LXST CXNTURY 11 0 60 33 29 29
5 Disaster's End KSLV Noh 10 0 50 39 27 27
6 Walk In Lxzt 10 0 40 35 30 30
8 Blindspot LXST CXNTURY 9 0 20 38 37 37

Test results: Mona Lisa

Search results for "mona lisa"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
2 Mona Lisa Dominic Fike 95 100 80 100 100 58
3 mona lisa mxmtoon 95 100 70 100 100 69
0 Mona Lisa (Spider-Man: Across the Spi... 94 100 100 78 100 27
1 Mona Lisa (feat. Kendrick Lamar) Lil ... 94 100 90 100 100 29
4 Mona Lisa Brentrambo LUCKI 93 100 60 80 100 51
5 Mona Lisa, Mona Lisa FINNEAS 93 100 50 100 100 50
8 Mona Lisa ONLY1 THEORY 91 100 20 100 100 58
7 The Ballad of Mona Lisa Panic! At The... 90 100 30 100 100 35
9 Mona Lisa Nat King Cole 90 100 10 100 100 56
6 Mona Lisas And Mad Hatters Elton John 86 100 40 78 62 39

Search results for "mona lisa dominic fike"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
1 Mona Lisa Dominic Fike 99 100 90 100 100 100
0 Mona Lisa (Spider-Man: Across the Spi... 97 100 100 91 100 56
3 Pasture Child Dominic Fike 55 50 70 67 71 58
2 Mona Lisa (feat. Kendrick Lamar) Lil ... 53 50 80 59 58 45
5 Phone Numbers Dominic Fike Kenny Beats 53 50 50 68 71 57
9 Dark Dominic Fike 52 50 10 83 83 62
6 Mona Lisa Valntn Peter Fenn Tray Hagg... 49 50 40 59 58 41
7 Mona Lisa Brentrambo LUCKI 49 50 30 56 58 46
4 Monalisa Lojay Sarz Chris Brown 15 0 60 55 45 45
8 Self Love (Spider-Man: Across the Spi... 9 0 20 50 32 27

Test results: Forever - SOPHIE Remix

Search results for "forever sophie"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 Forever - SOPHIE Remix FLETCHER SOPHIE 96 100 100 73 100 56
1 Darkness Forever - Sophie's Version S... 94 100 90 71 100 47
5 Forever Yours Sophie Deas 94 100 50 100 100 72
3 Forever in your arms Sophie Nichols 93 100 70 64 100 57
9 I Want A Mom (That Will Last Forever)... 87 100 10 64 100 31
2 Immaterial SOPHIE 54 50 80 69 60 58
6 HARD SOPHIE 53 50 40 84 71 64
4 Faceshopping SOPHIE 52 50 60 67 60 55
7 Darkness Forever Soccer Mommy 51 50 30 64 67 51
8 Sophie Bear's Den 49 50 20 67 60 58

Search results for "forever sophie remix"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 Forever - SOPHIE Remix FLETCHER SOPHIE 98 100 100 100 100 71
1 Forever - SOPHIE Remix FLETCHER SOPHIE 97 100 90 100 100 71
3 Forever - SOPHIE Remix FLETCHER SOPHIE 95 100 70 100 100 71
5 Forever - SOPHIE Remix FLETCHER SOPHIE 94 100 50 100 100 71
7 Forever - SOPHIE Remix FLETCHER SOPHIE 92 100 30 100 100 71
9 Forever - SOPHIE Remix FLETCHER SOPHIE 91 100 10 100 100 71
4 Forever (Solomun Remix) Weval Solomun 68 66 60 89 79 62
2 Forever Wavey 44 33 80 78 70 55
6 Forever Trit95 41 33 40 78 67 59
8 forever - Slowed & Reverb L0WS 38 33 20 59 61 61

Test results: She by Charles Aznavour

Search results for "she charles"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
1 She Charles Aznavour Bryan Ferry 95 100 90 82 100 51
0 She Cares Styx 94 100 100 90 72 72
2 She - Tous les visages de l’amour Cha... 93 100 80 82 100 25
4 She (Tous Les Visages De L’Amour) Cha... 91 100 60 82 100 25
3 She Cares Patrick Dorgan 87 100 70 71 51 51
9 She - She / English Version 1 - Theme... 87 100 10 82 100 22
7 Jesse Charles Wesley Godwin 53 50 30 84 78 53
6 She Elvis Costello 50 50 40 63 55 55
5 She Changes the Weather Swim Deep 49 50 50 67 43 41
8 She Changes Your Mind Copeland 47 50 20 63 44 44

Search results for "she charles aznavour"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 She (Tous Les Visages De L’Amour) Cha... 96 100 100 90 100 41
1 She Charles Aznavour Bryan Ferry 96 100 90 70 100 77
3 She Engelbert Humperdinck Charles Azn... 95 100 70 95 100 65
2 She - Tous les visages de l’amour Cha... 94 100 80 90 100 41
4 She Charles Aznavour Herbert Kretzmer... 94 100 60 92 100 62
6 She - She / English Version 1 - Theme... 91 100 40 90 100 37
5 Emmenez-moi Charles Aznavour 69 66 50 95 89 75
7 Venecia sin ti - Que c'est triste Ven... 66 66 30 90 89 51
9 Mes emmerdes - Remastered 2014 Charle... 65 66 10 90 89 58
8 She Elvis Costello 34 33 20 56 42 42

Test results: Run Away With Me

Search results for "run away with me"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 Run Away With Me Carly Rae Jepsen 97 100 100 81 100 65
1 Run Away With Me Carly Rae Jepsen 96 100 90 81 100 65
2 Run Away With Me Cold War Kids 95 100 80 75 100 70
3 Run away with me Mingginyu 95 100 70 74 100 76
4 Run Away With Me Ben Fankhauser 94 100 60 86 100 68
6 Run Away with Me Michael Arden 92 100 40 79 100 70
8 Run Away With Me Paradise Blossom 90 100 20 77 100 65
9 Run Away with Me - Live Aaron Tveit 89 100 10 69 100 65
5 Run Away with You Big & Rich 73 75 50 62 90 67
7 Run Away Real McCoy 51 50 30 57 67 63

Search results for "run away with me carly rae jepsen"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 Run Away With Me Carly Rae Jepsen 100 100 100 100 100 100
1 Run Away With Me Carly Rae Jepsen 99 100 90 100 100 100
2 Run Away With Me - ASTR Remix Carly R... 96 100 80 82 100 86
4 Run Away With Me - Y2K Remix Carly Ra... 95 100 60 88 100 87
6 Run Away With Me - Patrick Stump Remi... 92 100 40 81 100 66
8 Run Away With Me - Cyril Hahn Remix C... 91 100 20 74 100 80
9 Psychedelic Switch Carly Rae Jepsen 56 57 10 67 71 68
3 Kollage Carly Rae Jepsen 51 42 70 75 80 63
5 Kamikaze Carly Rae Jepsen 50 42 50 78 78 66
7 The Loneliest Time (feat. Rufus Wainw... 45 42 30 61 65 49

Test results: The Dress

Search results for "the dress"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 The Dress Dijon 98 100 100 100 100 75
1 The Dress Looks Nice on You Sufjan St... 94 100 90 80 100 35
3 The Dress 猫 シ Corp. luxury elite 93 100 70 78 100 45
5 The Dress The Dudes 93 100 50 80 100 64
7 The Dress Alan Menken 91 100 30 78 100 60
9 The Dress Blonde Redhead 89 100 10 78 100 55
6 Dress Taylor Swift 53 50 40 80 71 52
2 Girl Anachronism The Dresden Dolls 52 50 80 67 50 37
4 Skin Dijon 9 0 60 27 21 21
8 Talk Down Dijon 7 0 20 33 25 25

Search results for "the dress dijon"

original index candidate (title + artist) weighted difflib Spotify rank partial token sort ratio token set ratio token sort ratio
0 The Dress Dijon 100 100 100 100 100 100
6 The Dress Alan Menken 65 66 40 64 75 61
7 The Dress The Dudes 63 66 30 60 75 53
9 The Dress Blonde Redhead 62 66 10 67 75 62
4 jesse Dijon 45 33 60 82 77 77
1 Dijon Antoine Stavelot 42 33 90 62 54 49
2 Dijon Lofi Printeme 41 33 80 57 53 53
3 Dress Down Kaoru Akimoto 40 33 70 67 51 51
5 Magic Loop DJDS Dijon 39 33 50 69 50 50
8 Spanish Ladies The Dreadnoughts 35 33 20 55 48 48

Clone this wiki locally