Skip to content

Tags.search() incorrectly rejects valid acronym tags due to .title() normalization #56

@Sergeileduc

Description

@Sergeileduc

Summary

The Tags.search() method incorrectly raises InvalidTag for valid tags that are acronyms (e.g. "OF"). This happens because the fuzzy‑matching logic applies .title() to the input tag, which breaks uppercase acronyms and makes them impossible to match. This causes some tests to fail randomly when random_tag() returns an acronym tag.

Reproduction Steps

from redgifs.tags import Tags

tags = Tags()
matches = tags.search("OF")   # raises InvalidTag("OF")

print(matches)

Expected Result

['OF']

Actual Result

Traceback (most recent call last):
  File "C:\Users\serge\Dev\Forks\redgifs\error-tag.py", line 4, in <module>
    matches = tags.search("OF")   # raises InvalidTag("OF")
  File "C:\Users\serge\Dev\Forks\redgifs\redgifs\tags.py", line 73, in search
    raise InvalidTag(tag)
redgifs.errors.InvalidTag: Tag for "OF" was not found.

But "OF" is a valid tag present in tags.json:

"of": "OF"

System Information

git pull                                                                                        
Already up to date. 

Checklist

  • I have searched all open and closed issues for any duplicates.

Additional Information

Root cause

In Tags.search():

results = difflib.get_close_matches(tag.title(), self.tags_mapping.values())

For acronym tags:

"OF".title() → "Of"

But "Of" does not exist in self.tags_mapping.values(), which contains "OF".

Therefore:

  • exact lookup fails
  • fuzzy lookup fails
  • InvalidTag("OF") is raised even though "OF" is valid

This makes tests like:

def test_order_top28():
    r = api.search(random_tag(), order=Order.TOP28)
    assert r

fail nondeterministically depending on whether random_tag() returns "OF".


Proposed solutions

Option A — Remove .title() entirely (simplest and correct)

results = difflib.get_close_matches(tag, self.tags_mapping.values())

This fixes the issue for all acronym tags.


Option B — Normalize both sides consistently

normalized_values = {v.lower(): v for v in self.tags_mapping.values()}
results = difflib.get_close_matches(tag.lower(), normalized_values.keys())

This makes fuzzy matching case‑insensitive and robust.


Option C — Detect acronyms

if tag.isupper():
    results = difflib.get_close_matches(tag, self.tags_mapping.values())
else:
    results = difflib.get_close_matches(tag.title(), self.tags_mapping.values())

This preserves the original behavior while fixing acronym handling.


Additional note: updating tags.json

You already have a script to regenerate tags.json from the API.
Even with an updated dataset, the .title() logic will still break acronym tags, so fixing the search logic is necessary.


Conclusion

Valid acronym tags like "OF" are currently rejected due to .title() normalization.
Removing .title() or normalizing consistently would fix the issue and make tag search deterministic and reliable.

I can open a PR if you prefer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions