-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Summary
We want to test a simple “transposition/permutation” step in our address matcher so we can assess whether swapping parts of an address improves matching accuracy.
This is motivated by the paper Methods for Matching English Language Addresses, which treats transposition (e.g. swapping address line order) as a realistic transformation in address data and uses it when generating matched address pairs for evaluation. Source: https://arxiv.org/html/2403.12092v1
Proposed work
- Implement a small transposition utility that can:
- swap address line 1 and address line 2 (if both exist)
- (optional, later) support limited additional swaps if we already parse components such as building/sub-building
- Add an evaluation mode that:
- runs matching without transposition
- runs matching with transposition enabled
- compares metrics (e.g. precision/recall/accuracy, or our existing evaluation measures)
Acceptance criteria
- Transposition is behind a config flag (off by default).
- Unit tests cover:
- no-op behaviour when a component is missing
- expected swaps for common formats (two-line addresses at minimum)
- Benchmark output clearly reports baseline vs transposition-enabled results on at least one labelled/ground-truth dataset.
Notes / guardrails
- Keep it simple to start: line1 ↔ line2 swap only.
- Avoid exploding candidate volume (generate at most one transposed variant per input).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels