Bug 1958992 - suggest: Improve geonames l10n and weather-suggestions matching.#6745
Merged
Bug 1958992 - suggest: Improve geonames l10n and weather-suggestions matching.#6745
Conversation
Member
|
@0c0w3 - Hey Drew, just finished up some other stuff, will look into this next week. Sorry for the delay. |
ncloudioj
approved these changes
May 9, 2025
Member
ncloudioj
left a comment
There was a problem hiding this comment.
Looks good, thanks! The l10n handling is very cool.
a9358ac to
23d1ebd
Compare
Contributor
Author
|
Thanks! I'll wait to merge this until I can get a desktop patch together. I think I'll also need another PR where |
23d1ebd to
b6d072d
Compare
Contributor
Author
|
The latest commit reverts the change from |
ncloudioj
approved these changes
May 20, 2025
5 tasks
Contributor
Author
|
I'm going to start merging these geonames PRs now. |
…matching.
This is a substantial reworking of geonames and weather suggestions in suggest.
Summary of major changes:
In RS, don't store geonames' alternate names inline with the core geonames data.
Instead, use separate record types. (As a reminder, "alternates" just means
variants of a geoname's main name, like "NYC" and "NY" are alternates for New
York City.) So now there are two record types: core geonames data and
alternates. The core records contain the main geonames data: IDs, canonical
name, country, admin divisions, etc., and they can be ingested by all clients
regardless of their locale or country. The alternates records are scoped by
language and are intended to be ingested only by clients with matching locales.
Improve geonames fetching and weather-suggestion matching so all admin levels
and countries are supported. e.g., "waterloo on", "waterloo canada", "waterloo
on canada", etc.
Relax the weather parsing a little to allow multiple weather keywords ("rain
weather").
Keep track of all available admin codes per geoname. There are four of them.
This is necessary because a lot of countries outside North America have multiple
admin levels, and determining whether a given geoname is related to another one
requires comparing their admin codes.
Instead of manually computing name variants and inserting them separately into
the DB, use a custom Sqlite collating sequence. ("Variants" here means removing
punctuation, lowercasing, removing diacritics, etc.)
Store each geoname's `ascii_name` as an alternate. That's useful for chars like
"ö", which is represented as "oe" in the ASCII name (at least the geonames data
I've seen).
Minor changes:
Store latitude and longitude and strings instead of floats. I made this change
to derive `Eq` for `Geoname`, but it makes sense anyway and is how I should have
done it originally.
Add `Geoname::geoname_type` so consumers can easily understand whether it's a
city, admin region, or country.
Remove the `geoname_type` param from `fetch_geonames`. Consumers can filter out
matching geonames that they don't want instead.
1a30271 to
117819f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a substantial reworking of geonames and weather suggestions in suggest, including some breaking API changes. I didn't bother deprecating anything because AFAIK desktop is the only consumer that uses these, and we can just fix it when we vendor.
Summary of major changes:
In RS, don't store geonames' alternate names inline with the core geonames data. Instead, use separate record types. (As a reminder, "alternates" just means variants of a geoname's main name, like "NYC" and "NY" are alternates for New York City.) So now there are two record types: core geonames data and alternates. The core records contain the main geonames data: IDs, canonical name, country, admin divisions, etc., and they can be ingested by all clients regardless of their locale or country. The alternates records are scoped by language and are intended to be ingested only by clients with matching locales.
Improve geonames fetching and weather-suggestion matching so all admin levels and countries are supported. e.g., "waterloo on", "waterloo canada", "waterloo on canada", etc.
Relax the weather parsing a little to allow multiple weather keywords ("rain weather").
Keep track of all available admin codes per geoname. There are four of them. This is necessary because a lot of countries outside North America have multiple admin levels, and determining whether a given geoname is related to another one requires comparing their admin codes.
Instead of manually computing name variants and inserting them separately into the DB, use a custom Sqlite collating sequence. ("Variants" here means removing punctuation, lowercasing, removing diacritics, etc.)
Store each geoname's
ascii_nameas an alternate. That's useful for chars like "ö", which is represented as "oe" in the ASCII name (at least the geonames data I've seen).Minor changes:
Store latitude and longitude and strings instead of floats. I made this change to derive
EqforGeoname, but it makes sense anyway and is how I should have done it originally.Add
Geoname::geoname_typeso consumers can easily understand whether it's a city, admin region, or country.Remove the
geoname_typeparam fromfetch_geonames. Consumers can filter out matching geonames that they don't want instead.Pull Request checklist
[ci full]to the PR title.