Skip to content

feat: add basic address-part transposition tests to evaluate impact on matching accuracy #190

@ThomasHepworth

Description

@ThomasHepworth

Summary

We want to test a simple “transposition/permutation” step in our address matcher so we can assess whether swapping parts of an address improves matching accuracy.

This is motivated by the paper Methods for Matching English Language Addresses, which treats transposition (e.g. swapping address line order) as a realistic transformation in address data and uses it when generating matched address pairs for evaluation. Source: https://arxiv.org/html/2403.12092v1

Proposed work

  • Implement a small transposition utility that can:
    • swap address line 1 and address line 2 (if both exist)
    • (optional, later) support limited additional swaps if we already parse components such as building/sub-building
  • Add an evaluation mode that:
    • runs matching without transposition
    • runs matching with transposition enabled
    • compares metrics (e.g. precision/recall/accuracy, or our existing evaluation measures)

Acceptance criteria

  • Transposition is behind a config flag (off by default).
  • Unit tests cover:
    • no-op behaviour when a component is missing
    • expected swaps for common formats (two-line addresses at minimum)
  • Benchmark output clearly reports baseline vs transposition-enabled results on at least one labelled/ground-truth dataset.

Notes / guardrails

  • Keep it simple to start: line1 ↔ line2 swap only.
  • Avoid exploding candidate volume (generate at most one transposed variant per input).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions