Commit a0e6dec
authored
feat(address-service): DOMA-12746 implement heuristics-based address deduplication system (#7196)
* feat(address-service): use FIAS ID for address key generation in FIAS-compatible providers
* feat(address-service): DOMA-12746 move address key generation to provider classes
* feat(address-service): DOMA-12746 implement heuristics-based address deduplication system
Introduce a flexible, provider-agnostic system for identifying duplicate addresses
across different external providers (Dadata, Google, Yandex, Pullenti).
Instead of relying solely on Address.key for deduplication, the system now extracts
structured heuristics (FIAS ID, coordinates, Google Place ID, fallback key) from each
provider and matches them against existing records with configurable reliability scores.
Schema changes:
- Add AddressHeuristic model with fields: address, type, value, reliability, provider,
meta, enabled, latitude, longitude. Unique constraint on (type, value)
- Add Address.possibleDuplicateOf relationship field for flagging potential duplicates
- Address.key format now prefixed with heuristic type (fias_id:, fallback:, etc.)
Core logic:
- extractHeuristics() in each search provider (Dadata, Google, Pullenti, Injections)
- findAddressByHeuristics() with DB range queries for coordinate matching (~1.1m tolerance)
- upsertHeuristics() to create/update heuristic records on address resolution
- Shared mergeAddresses() utility for moving sources/heuristics and soft-deleting losers
Integration:
- Search flow updated: all 4 search plugins + searchServiceUtils now extract and match
heuristics before falling back to key-based lookup
- ActualizeAddressesService updated to handle AddressHeuristic during key-collision merges
- ResolveAddressDuplicateService: new GraphQL mutation for admin merge/dismiss actions
- Admin UI button to resolve duplicates with merge/dismiss/cancel prompt
Migration scripts (all support --dry-run):
- migrate-address-keys-to-heuristics.js: updates Address.key format
- create-address-heuristics.js: backfills AddressHeuristic records from existing data
- merge-duplicate-addresses.js: bulk auto-merge clear duplicate cases
Includes migration guide (docs/MIGRATION-heuristics.md) and unit tests for
heuristicMatcher (coordinatesMatch, parseCoordinates).
* test(address-service): DOMA-12746 add extractHeuristics mock calls to SearchByFiasId plugin tests
Add extractHeuristics method to mock search provider and verify it's called with normalized results in all test cases. Update createOrUpdateAddressWithSource mock calls to include empty heuristics array parameter.
* fix(address-service): DOMA-12746 add onKeyDown handler to ResolveAddressDuplicate button for accessibility
* feat(address-service): DOMA-12746 refactor merge-duplicate-addresses script to run as remote GraphQL client
Move merge-duplicate-addresses.js from bin/ to bin/local/ and rewrite to connect to condo and address-service as remote GraphQL clients instead of running locally with keystone context. Script now determines merge winner by checking which Address.id is actually referenced in condo Properties (Property.addressKey), skipping ambiguous cases where both addresses are referenced. Add .env.example and README
* fix(address-service): DOMA-12746 add safety check and error handling to merge-duplicate-addresses script
Add validation in mergeAddresses() to prevent merging when other addresses reference the loser via possibleDuplicateOf, avoiding dangling references after soft-delete. Wrap resolveDuplicate calls in try-catch to handle server errors gracefully and continue processing remaining duplicates instead of failing entire batch.
* refactor(address-service): DOMA-12746 use heuristic type constants in Dadata and Pullenti address key generation
Replace hardcoded 'fias:' prefix with HEURISTIC_TYPE_FIAS_ID constant and add HEURISTIC_TYPE_FALLBACK prefix to fallback keys in DadataSuggestionProvider and PullentiSuggestionProvider. Add null check to only prefix non-empty fallback keys.
* docs(address-service): DOMA-12746 update migration guide and remove deprecated addressKeyUtils
Update MIGRATION-heuristics.md to clarify merge-duplicate-addresses script runs as remote GraphQL client with setup instructions. Remove deprecated generateAddressKey() and generateAddressKeyFromFiasId() functions from addressKeyUtils.js along with associated tests and FIAS_PROVIDERS constant, as address key generation has been moved to provider classes. Update ActualizeAddressesService tests to call provider
* docs(address-service): DOMA-12746 remove unnecessary makemigrations step from migration guide
* refactor(address-service): DOMA-12746 simplify mergeAddresses heuristic reassignment logic
* refactor(address-service): DOMA-12746 improve findRootAddress to handle soft-deleted nodes in duplicate chain
Replace getById with find to filter out soft-deleted addresses (deletedAt: null). Track lastAliveId during traversal and return it when encountering deleted nodes or hitting maxDepth, preventing dangling references to soft-deleted addresses. Add comprehensive unit tests covering chain traversal, soft-delete handling, and maxDepth boundary cases.
* refactor(address-service): DOMA-12746 add enabled filter to heuristic lookup in upsertHeuristics
* refactor(address-service): DOMA-12746 return null instead of empty string from generateAddressKey when no parts available
* refactor(address-service): DOMA-12746 migrate admin-ui buttons to @open-condo/ui components
* refactor(address-service): DOMA-12746 add confirmation dialog to actualize address button in admin-ui
* refactor(address-service): DOMA-12746 add error handling and validation to resolveDuplicate function
* refactor(address-service): DOMA-12746 skip addresses with null possibleDuplicateOf target in merge script
* refactor(address-service): DOMA-12746 fix pagination in merge-duplicate-addresses script to handle merged records
* refactor(address-service): DOMA-12746 add @open-condo/apollo-server-client and @open-condo/ui dependencies
* refactor(address-service): DOMA-12746 update generateAddressKey return type to allow null in JSDoc annotations
* refactor(address-service): DOMA-12746 optimize upsertHeuristics to set possibleDuplicateOf once with highest reliability conflict
Split upsertHeuristics into two passes: first detect conflicts and select the single best one by reliability, then create new heuristic records. This ensures possibleDuplicateOf is set at most once with a deterministic choice instead of potentially multiple times during iteration.
* refactor(address-service): DOMA-12746 migrate merge-duplicate-addresses script to use @open-condo/config and add export to .env.example
* refactor(address-service): DOMA-12746 optimize migrate-address-keys-to-heuristics script to use bulk SQL updates
Replace record-by-record iteration with bulk SQL UPDATE statements using CASE expressions. Add statement_timeout management, execution time tracking, and detailed migration statistics. Skip express app preparation for faster DB-only migration.
* refactor(address-service): DOMA-12746 optimize create-address-heuristics script to use bulk operations and batch inserts
Replace record-by-record iteration with batch processing using SQL queries. Add statement_timeout management, execution time tracking, and detailed migration statistics by heuristic type. Implement in-memory caching for exact and coordinate heuristic lookups, root address resolution, and batch insert operations. Skip addresses with low-quality coordinates from DaData provider
* refactor(address-service): DOMA-12746 add possibleDuplicateOf field to Address GQL queries and regenerate TypeScript schema types
* refactor(address-service): DOMA-12746 filter coordinate heuristics to only include exact geo quality (qc_geo=0) from DaData
Add hasExactGeoQuality helper function and update extractHeuristics to skip coordinate heuristics for non-exact geo quality. Set fixed reliability of 90 for exact coordinates instead of variable reliability by qc_geo level. Add unit tests covering exact, non-exact, and fallback heuristic scenarios.
* refactor(address-service): DOMA-12746 add unit tests for heuristic-based address matching in createOrUpdateAddressWithSource
* test(address-service): DOMA-12746 add unit tests for ResolveAddressDuplicateService
Add comprehensive test coverage for resolveAddressDuplicate mutation including access control (anonymous, user, admin, support), validation (non-existent address, missing possibleDuplicateOf, invalid action/winnerId), and logic tests (dismiss clears possibleDuplicateOf, merge soft-deletes loser and keeps winner).
* refactor(address-service): DOMA-12746 add race condition handling for concurrent heuristic creation in upsertHeuristics
Extract findExistingHeuristicsForConflict and isAddressHeuristicUniqueViolation helper functions. Wrap AddressHeuristicServerUtils.create in try-catch to detect unique constraint violations from concurrent requests. On unique violation, re-check for existing heuristic and set possibleDuplicateOf if conflict detected. Add unit tests covering race condition handling and non-unique error
* test(address-service): DOMA-12746 add unit tests for SuggestionKeystoneApp
Add test coverage for /suggest endpoint including validation (missing query parameter), provider proxying (DaData suggestion transformation), and bypass mode (raw provider payload passthrough).
* test(address-service): DOMA-12746 add unit tests for SearchKeystoneApp
Add test coverage for /search endpoint including validation (missing query parameter), provider search flow (DaData suggestion transformation and address/source/heuristic creation), idempotency (no duplicate records on repeated requests), and not-found handling (404 when provider returns empty results).
* docs(address-service): DOMA-12746 update heuristics migration guide with troubleshooting section and clarifications
Remove outdated breaking changes (FIAS_PROVIDERS constant, generateAddressKeyFromFiasId removal). Add troubleshooting section covering individual bad heuristic handling (disable/edit/delete via enabled flag) and Google coordinate false-positive mitigation. Add example for non-Russian fallback key migration. Clarify generateAddressKey() replacement details.
* refactor(address-service): DOMA-12746 remove fias_id key migration logic from migrate-address-keys-to-heuristics script
Remove fias:<uuid> → fias_id:<uuid> migration logic as it's no longer needed. Update script to only handle fallback key prefix migration. Remove fiasToMigrateCount tracking, fias_id SQL update query, and related statistics. Update documentation to reflect that only fallback key migration is performed.
* refactor(address-service): DOMA-12746 remove fias_id key parsing from create-address-heuristics script
* refactor(address-service): DOMA-12746 fix skip increment in merge-duplicate-addresses script for dry-run mode
* refactor(address-service): DOMA-12746 improve error handling in ResolveAddressDuplicateService and admin UI
Replace generic Error throws with structured GQLError in ResolveAddressDuplicateService. Add validation for soft-deleted target addresses. Restrict merge action to only allow possibleDuplicateOf as winnerId (remove bidirectional merge support). Add extractErrorMessage helper in admin UI to display user-friendly error messages from GraphQL errors. Update all validation tests to use expectToThrowGQLErrorToResult
* refactor(address-service): DOMA-12746 increase coordinate precision in AddressHeuristicHistoryRecord from 4 to 8 decimal places
* refactor(address-service): DOMA-12746 add latitude/longitude to coordinate candidate query in create-address-heuristics script
* refactor(address-service): DOMA-12746 fix possibleDuplicateOf links count in create-address-heuristics script dry-run mode
* test(condo): add createTestBillingIntegration calls to acquiring and marketplace tests
Add createTestBillingIntegration setup in PaymentsFile, MarketPriceScope, and RegisterResidentInvoiceService tests.
* refactor(address-service): DOMA-12746 add heuristics extraction and upsert to ActualizeAddressesService
Add DadataSearchProvider to extract heuristics from DaData search results in ActualizeAddressesService. Call upsertHeuristics after address update to persist heuristics alongside actualized address data. Add comment explaining dual provider usage (SuggestionProvider for fresh data, SearchProvider for heuristic extraction).
* refactor(address-service): DOMA-12746 fix coordinate validation in DadataSearchProvider to handle zero values
Replace truthy check with explicit null check for geoLat/geoLon to correctly handle zero coordinates (e.g., locations near equator/prime meridian). Previous implementation would skip valid coordinates with zero values.
* refactor(address-service): DOMA-12746 defer possibleDuplicateOf update in upsertHeuristics to end of function
Move possibleDuplicateOf update from first pass to end of function to ensure only one update occurs. Compare conflicts from both passes and select highest reliability match overall.
* refactor(address-service): DOMA-12746 add soft-delete check for address in ResolveAddressDuplicateService
* refactor(address-service): DOMA-12746 wrap logger metadata in data object for structured logging
Wrap all logger call parameters (addressId, winnerId, loserId, etc.) in a data object to follow structured logging conventions. Update logger calls in ResolveAddressDuplicateService, mergeAddresses, and heuristicMatcher.
* refactor(address-service): DOMA-12746 add input validation and trim whitespace in parseCoordinates helper
Add null/type check for coordString parameter and trim whitespace before parsing to handle malformed coordinate strings. Prevents errors when coordString is null, undefined, or non-string.
* refactor(address-service): DOMA-12746 fix coordinate validation in GoogleSearchProvider to handle zero values
* refactor(address-service): DOMA-12746 skip merge when duplicate address is referenced in condo Properties
Skip merge operation when current address (duplicate) is referenced in Properties, as the mutation requires the target to be the winner. Previously attempted to swap winner/loser which would violate merge constraints.
* refactor(address-service): DOMA-12746 add fias_id key parsing support to create-address-heuristics script
* refactor(address-service): DOMA-12746 fix coordinate validation in create-address-heuristics to handle zero values
* refactor(address-service): DOMA-12746 add self-link check before updating possibleDuplicateOf in upsertHeuristics
Add validation to prevent setting possibleDuplicateOf to the same address (self-link). Log warning when rootAddressId equals addressId and skip the update operation to avoid creating invalid duplicate relationships.
* refactor(address-service): DOMA-12746 optimize merge-duplicate-addresses script with batch processing and progress tracking
Add batch property reference checking to reduce database queries, implement progress bar visualization, and improve logging format. Replace per-address isAddressReferenced calls with single batch getReferencedAddressIds query. Add total record count display and page-based progress indicators.1 parent feab453 commit a0e6dec
File tree
51 files changed
+6208
-2113
lines changed- apps
- address-service
- admin-ui
- bin
- local
- docs
- domains
- address
- access
- schema
- utils
- serverSchema
- common
- constants
- utils
- services
- search
- plugins
- providers
- suggest
- providers
- migrations
- condo/domains
- acquiring/schema
- resident/schema
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
51 files changed
+6208
-2113
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
| 9 | + | |
| 10 | + | |
9 | 11 | | |
10 | 12 | | |
11 | 13 | | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
16 | 25 | | |
17 | 26 | | |
18 | 27 | | |
19 | 28 | | |
20 | 29 | | |
21 | 30 | | |
22 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
23 | 40 | | |
24 | 41 | | |
25 | 42 | | |
26 | 43 | | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
27 | 52 | | |
28 | 53 | | |
29 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
30 | 60 | | |
31 | 61 | | |
32 | 62 | | |
33 | | - | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
34 | 70 | | |
35 | 71 | | |
36 | 72 | | |
| |||
46 | 82 | | |
47 | 83 | | |
48 | 84 | | |
| 85 | + | |
49 | 86 | | |
50 | 87 | | |
51 | 88 | | |
52 | 89 | | |
53 | | - | |
54 | | - | |
55 | | - | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
56 | 159 | | |
57 | 160 | | |
58 | 161 | | |
| |||
64 | 167 | | |
65 | 168 | | |
66 | 169 | | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
72 | 180 | | |
73 | 181 | | |
74 | 182 | | |
0 commit comments