-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
I've discovered that mismo.lib.geo.CoordinateBlocker doesn't handle missing values as I'd expect.
If a record has a missing coordinate value, I would not expect it to be blocked as the returned distance would be NaN.
The following example shows that records with a null coordinate value are indeed blocked together
from mismo.lib.geo import CoordinateBlocker
import ibis
ibis.options.interactive = True
con = ibis.get_backend()
data =[{"record_id":1, "lat":1, "lon":1}, {"record_id":2, "lat":2, "lon":None}, {"record_id":3, "lat":3, "lon":None}]
table = con.create_table("test", ibis.memtable(data), overwrite=True)
blocker = CoordinateBlocker(lat="lat", lon="lon", distance_km=1000)
blocker(table, table)
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┓
┃ record_id_l ┃ record_id_r ┃ lat_l ┃ lat_r ┃ lon_l ┃ lon_r ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━┩
│ int64 │ int64 │ int64 │ int64 │ float64 │ float64 │
├─────────────┼─────────────┼───────┼───────┼─────────┼─────────┤
│ 2 │ 3 │ 2 │ 3 │ NULL │ NULL │
└─────────────┴─────────────┴───────┴───────┴─────────┴─────────┘In this case, I can see that mismo.lib.geo.distance_km evaluates to NULL,
I think this can be resolved by modifying the logic here so that it returns null if either lat or lon is null
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels