|
| 1 | +# Implementation |
| 2 | +## Why it was made ? |
| 3 | + |
| 4 | +Latitude and longitude are precise but not meaningful on their own. Most systems and people reason in terms of places - cities, regions, and addresses, not raw coordinates. Converting lat - long into an address adds semantic context, making location data understandable, searchable, and actionable in real-world applications. |
| 5 | + |
| 6 | +This method of converting coordinates into sensible locations is known as reverse geocoding |
| 7 | +They are heavily used in almost every customer facing applications like delivery, travel booking etc |
| 8 | + |
| 9 | +Conventional way of doing was using commercial API's like Google Maps, Mapbox |
| 10 | +They are very much accurate, but costly in nature, also they come with various rate limits. |
| 11 | +For large data intensive applications this may incur as a limitation |
| 12 | + |
| 13 | +Hence, there are Offline Reverse Geocoding libraries, they would be completely offline, since it runs on your machine |
| 14 | +there would be no rate limiting |
| 15 | +But they come at a cost of accuracy, their normal working procedure is to use a large point based dataset |
| 16 | +(most likely cities) coordinates and finding the nearest neighbour from the given coordinates using algorithms like |
| 17 | +`KDTree` from scipy |
| 18 | + |
| 19 | +## Problem with the conventional approach |
| 20 | +Suppose we have a situation like this |
| 21 | +```{image} ../_static/img.png |
| 22 | +:alt: Reverse geocoding flow |
| 23 | +:width: 600px |
| 24 | +:align: center |
| 25 | +``` |
| 26 | +Here, we are trying to find address of this given location (blue point), but the closes point |
| 27 | +to it is Lisbon, If you are naively using the nearest neighbour algorithm it is going to return |
| 28 | +the address of Lisbon, but that is wrong because the point is visibly inside the boundary of Vermount |
| 29 | + |
| 30 | +## Our approach |
| 31 | +Instead of focusing entirely on points, we would be considering boundaries also. |
| 32 | +- Step 1 |
| 33 | + - Find the nearest neighbouring boundaries from the given coordinate |
| 34 | + - This could be done by computing the centroid of polygons and use `KDTree` |
| 35 | + algorithm to get the nearest **boundaries** |
| 36 | + - By default, the nearest 3 neighbours are considered |
| 37 | + |
| 38 | +- Step 2 |
| 39 | + - Check whether which boundary encloses the given point |
| 40 | + |
| 41 | +This approach gives a validation that the given point is actually inside that boundary |
| 42 | + |
| 43 | +## Challenges |
| 44 | +**Storage**: This is one of the biggest challenges faced, because boundary data is huge, but |
| 45 | +Geoboundaries provide their simplified geometries free and open source, even though the total size |
| 46 | +was around 1.5 GB, so it was converted to WKB and stored inside sqlite for querying and filtering, hence the overall |
| 47 | +size reduced to around 90 MB |
| 48 | + |
| 49 | +**Speed**: It was impractical to check boundary enclosure for every boundary there is, hence computed |
| 50 | +their centroid instead and use them for primary layer of filtering, and enclosure was validated |
| 51 | +on the fly, saving time and space |
0 commit comments