Skip to content

Commit a0e6dec

Browse files
authored
feat(address-service): DOMA-12746 implement heuristics-based address deduplication system (#7196)
* feat(address-service): use FIAS ID for address key generation in FIAS-compatible providers * feat(address-service): DOMA-12746 move address key generation to provider classes * feat(address-service): DOMA-12746 implement heuristics-based address deduplication system Introduce a flexible, provider-agnostic system for identifying duplicate addresses across different external providers (Dadata, Google, Yandex, Pullenti). Instead of relying solely on Address.key for deduplication, the system now extracts structured heuristics (FIAS ID, coordinates, Google Place ID, fallback key) from each provider and matches them against existing records with configurable reliability scores. Schema changes: - Add AddressHeuristic model with fields: address, type, value, reliability, provider, meta, enabled, latitude, longitude. Unique constraint on (type, value) - Add Address.possibleDuplicateOf relationship field for flagging potential duplicates - Address.key format now prefixed with heuristic type (fias_id:, fallback:, etc.) Core logic: - extractHeuristics() in each search provider (Dadata, Google, Pullenti, Injections) - findAddressByHeuristics() with DB range queries for coordinate matching (~1.1m tolerance) - upsertHeuristics() to create/update heuristic records on address resolution - Shared mergeAddresses() utility for moving sources/heuristics and soft-deleting losers Integration: - Search flow updated: all 4 search plugins + searchServiceUtils now extract and match heuristics before falling back to key-based lookup - ActualizeAddressesService updated to handle AddressHeuristic during key-collision merges - ResolveAddressDuplicateService: new GraphQL mutation for admin merge/dismiss actions - Admin UI button to resolve duplicates with merge/dismiss/cancel prompt Migration scripts (all support --dry-run): - migrate-address-keys-to-heuristics.js: updates Address.key format - create-address-heuristics.js: backfills AddressHeuristic records from existing data - merge-duplicate-addresses.js: bulk auto-merge clear duplicate cases Includes migration guide (docs/MIGRATION-heuristics.md) and unit tests for heuristicMatcher (coordinatesMatch, parseCoordinates). * test(address-service): DOMA-12746 add extractHeuristics mock calls to SearchByFiasId plugin tests Add extractHeuristics method to mock search provider and verify it's called with normalized results in all test cases. Update createOrUpdateAddressWithSource mock calls to include empty heuristics array parameter. * fix(address-service): DOMA-12746 add onKeyDown handler to ResolveAddressDuplicate button for accessibility * feat(address-service): DOMA-12746 refactor merge-duplicate-addresses script to run as remote GraphQL client Move merge-duplicate-addresses.js from bin/ to bin/local/ and rewrite to connect to condo and address-service as remote GraphQL clients instead of running locally with keystone context. Script now determines merge winner by checking which Address.id is actually referenced in condo Properties (Property.addressKey), skipping ambiguous cases where both addresses are referenced. Add .env.example and README * fix(address-service): DOMA-12746 add safety check and error handling to merge-duplicate-addresses script Add validation in mergeAddresses() to prevent merging when other addresses reference the loser via possibleDuplicateOf, avoiding dangling references after soft-delete. Wrap resolveDuplicate calls in try-catch to handle server errors gracefully and continue processing remaining duplicates instead of failing entire batch. * refactor(address-service): DOMA-12746 use heuristic type constants in Dadata and Pullenti address key generation Replace hardcoded 'fias:' prefix with HEURISTIC_TYPE_FIAS_ID constant and add HEURISTIC_TYPE_FALLBACK prefix to fallback keys in DadataSuggestionProvider and PullentiSuggestionProvider. Add null check to only prefix non-empty fallback keys. * docs(address-service): DOMA-12746 update migration guide and remove deprecated addressKeyUtils Update MIGRATION-heuristics.md to clarify merge-duplicate-addresses script runs as remote GraphQL client with setup instructions. Remove deprecated generateAddressKey() and generateAddressKeyFromFiasId() functions from addressKeyUtils.js along with associated tests and FIAS_PROVIDERS constant, as address key generation has been moved to provider classes. Update ActualizeAddressesService tests to call provider * docs(address-service): DOMA-12746 remove unnecessary makemigrations step from migration guide * refactor(address-service): DOMA-12746 simplify mergeAddresses heuristic reassignment logic * refactor(address-service): DOMA-12746 improve findRootAddress to handle soft-deleted nodes in duplicate chain Replace getById with find to filter out soft-deleted addresses (deletedAt: null). Track lastAliveId during traversal and return it when encountering deleted nodes or hitting maxDepth, preventing dangling references to soft-deleted addresses. Add comprehensive unit tests covering chain traversal, soft-delete handling, and maxDepth boundary cases. * refactor(address-service): DOMA-12746 add enabled filter to heuristic lookup in upsertHeuristics * refactor(address-service): DOMA-12746 return null instead of empty string from generateAddressKey when no parts available * refactor(address-service): DOMA-12746 migrate admin-ui buttons to @open-condo/ui components * refactor(address-service): DOMA-12746 add confirmation dialog to actualize address button in admin-ui * refactor(address-service): DOMA-12746 add error handling and validation to resolveDuplicate function * refactor(address-service): DOMA-12746 skip addresses with null possibleDuplicateOf target in merge script * refactor(address-service): DOMA-12746 fix pagination in merge-duplicate-addresses script to handle merged records * refactor(address-service): DOMA-12746 add @open-condo/apollo-server-client and @open-condo/ui dependencies * refactor(address-service): DOMA-12746 update generateAddressKey return type to allow null in JSDoc annotations * refactor(address-service): DOMA-12746 optimize upsertHeuristics to set possibleDuplicateOf once with highest reliability conflict Split upsertHeuristics into two passes: first detect conflicts and select the single best one by reliability, then create new heuristic records. This ensures possibleDuplicateOf is set at most once with a deterministic choice instead of potentially multiple times during iteration. * refactor(address-service): DOMA-12746 migrate merge-duplicate-addresses script to use @open-condo/config and add export to .env.example * refactor(address-service): DOMA-12746 optimize migrate-address-keys-to-heuristics script to use bulk SQL updates Replace record-by-record iteration with bulk SQL UPDATE statements using CASE expressions. Add statement_timeout management, execution time tracking, and detailed migration statistics. Skip express app preparation for faster DB-only migration. * refactor(address-service): DOMA-12746 optimize create-address-heuristics script to use bulk operations and batch inserts Replace record-by-record iteration with batch processing using SQL queries. Add statement_timeout management, execution time tracking, and detailed migration statistics by heuristic type. Implement in-memory caching for exact and coordinate heuristic lookups, root address resolution, and batch insert operations. Skip addresses with low-quality coordinates from DaData provider * refactor(address-service): DOMA-12746 add possibleDuplicateOf field to Address GQL queries and regenerate TypeScript schema types * refactor(address-service): DOMA-12746 filter coordinate heuristics to only include exact geo quality (qc_geo=0) from DaData Add hasExactGeoQuality helper function and update extractHeuristics to skip coordinate heuristics for non-exact geo quality. Set fixed reliability of 90 for exact coordinates instead of variable reliability by qc_geo level. Add unit tests covering exact, non-exact, and fallback heuristic scenarios. * refactor(address-service): DOMA-12746 add unit tests for heuristic-based address matching in createOrUpdateAddressWithSource * test(address-service): DOMA-12746 add unit tests for ResolveAddressDuplicateService Add comprehensive test coverage for resolveAddressDuplicate mutation including access control (anonymous, user, admin, support), validation (non-existent address, missing possibleDuplicateOf, invalid action/winnerId), and logic tests (dismiss clears possibleDuplicateOf, merge soft-deletes loser and keeps winner). * refactor(address-service): DOMA-12746 add race condition handling for concurrent heuristic creation in upsertHeuristics Extract findExistingHeuristicsForConflict and isAddressHeuristicUniqueViolation helper functions. Wrap AddressHeuristicServerUtils.create in try-catch to detect unique constraint violations from concurrent requests. On unique violation, re-check for existing heuristic and set possibleDuplicateOf if conflict detected. Add unit tests covering race condition handling and non-unique error * test(address-service): DOMA-12746 add unit tests for SuggestionKeystoneApp Add test coverage for /suggest endpoint including validation (missing query parameter), provider proxying (DaData suggestion transformation), and bypass mode (raw provider payload passthrough). * test(address-service): DOMA-12746 add unit tests for SearchKeystoneApp Add test coverage for /search endpoint including validation (missing query parameter), provider search flow (DaData suggestion transformation and address/source/heuristic creation), idempotency (no duplicate records on repeated requests), and not-found handling (404 when provider returns empty results). * docs(address-service): DOMA-12746 update heuristics migration guide with troubleshooting section and clarifications Remove outdated breaking changes (FIAS_PROVIDERS constant, generateAddressKeyFromFiasId removal). Add troubleshooting section covering individual bad heuristic handling (disable/edit/delete via enabled flag) and Google coordinate false-positive mitigation. Add example for non-Russian fallback key migration. Clarify generateAddressKey() replacement details. * refactor(address-service): DOMA-12746 remove fias_id key migration logic from migrate-address-keys-to-heuristics script Remove fias:<uuid> → fias_id:<uuid> migration logic as it's no longer needed. Update script to only handle fallback key prefix migration. Remove fiasToMigrateCount tracking, fias_id SQL update query, and related statistics. Update documentation to reflect that only fallback key migration is performed. * refactor(address-service): DOMA-12746 remove fias_id key parsing from create-address-heuristics script * refactor(address-service): DOMA-12746 fix skip increment in merge-duplicate-addresses script for dry-run mode * refactor(address-service): DOMA-12746 improve error handling in ResolveAddressDuplicateService and admin UI Replace generic Error throws with structured GQLError in ResolveAddressDuplicateService. Add validation for soft-deleted target addresses. Restrict merge action to only allow possibleDuplicateOf as winnerId (remove bidirectional merge support). Add extractErrorMessage helper in admin UI to display user-friendly error messages from GraphQL errors. Update all validation tests to use expectToThrowGQLErrorToResult * refactor(address-service): DOMA-12746 increase coordinate precision in AddressHeuristicHistoryRecord from 4 to 8 decimal places * refactor(address-service): DOMA-12746 add latitude/longitude to coordinate candidate query in create-address-heuristics script * refactor(address-service): DOMA-12746 fix possibleDuplicateOf links count in create-address-heuristics script dry-run mode * test(condo): add createTestBillingIntegration calls to acquiring and marketplace tests Add createTestBillingIntegration setup in PaymentsFile, MarketPriceScope, and RegisterResidentInvoiceService tests. * refactor(address-service): DOMA-12746 add heuristics extraction and upsert to ActualizeAddressesService Add DadataSearchProvider to extract heuristics from DaData search results in ActualizeAddressesService. Call upsertHeuristics after address update to persist heuristics alongside actualized address data. Add comment explaining dual provider usage (SuggestionProvider for fresh data, SearchProvider for heuristic extraction). * refactor(address-service): DOMA-12746 fix coordinate validation in DadataSearchProvider to handle zero values Replace truthy check with explicit null check for geoLat/geoLon to correctly handle zero coordinates (e.g., locations near equator/prime meridian). Previous implementation would skip valid coordinates with zero values. * refactor(address-service): DOMA-12746 defer possibleDuplicateOf update in upsertHeuristics to end of function Move possibleDuplicateOf update from first pass to end of function to ensure only one update occurs. Compare conflicts from both passes and select highest reliability match overall. * refactor(address-service): DOMA-12746 add soft-delete check for address in ResolveAddressDuplicateService * refactor(address-service): DOMA-12746 wrap logger metadata in data object for structured logging Wrap all logger call parameters (addressId, winnerId, loserId, etc.) in a data object to follow structured logging conventions. Update logger calls in ResolveAddressDuplicateService, mergeAddresses, and heuristicMatcher. * refactor(address-service): DOMA-12746 add input validation and trim whitespace in parseCoordinates helper Add null/type check for coordString parameter and trim whitespace before parsing to handle malformed coordinate strings. Prevents errors when coordString is null, undefined, or non-string. * refactor(address-service): DOMA-12746 fix coordinate validation in GoogleSearchProvider to handle zero values * refactor(address-service): DOMA-12746 skip merge when duplicate address is referenced in condo Properties Skip merge operation when current address (duplicate) is referenced in Properties, as the mutation requires the target to be the winner. Previously attempted to swap winner/loser which would violate merge constraints. * refactor(address-service): DOMA-12746 add fias_id key parsing support to create-address-heuristics script * refactor(address-service): DOMA-12746 fix coordinate validation in create-address-heuristics to handle zero values * refactor(address-service): DOMA-12746 add self-link check before updating possibleDuplicateOf in upsertHeuristics Add validation to prevent setting possibleDuplicateOf to the same address (self-link). Log warning when rootAddressId equals addressId and skip the update operation to avoid creating invalid duplicate relationships. * refactor(address-service): DOMA-12746 optimize merge-duplicate-addresses script with batch processing and progress tracking Add batch property reference checking to reduce database queries, implement progress bar visualization, and improve logging format. Replace per-address isAddressReferenced calls with single batch getReferencedAddressIds query. Add total record count display and page-based progress indicators.
1 parent feab453 commit a0e6dec

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+6208
-2113
lines changed
Lines changed: 123 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,72 @@
1-
import { useMutation, gql } from '@apollo/client'
1+
import { useMutation, useQuery, gql } from '@apollo/client'
22
import Logo from '@app/address-service/admin-ui/logo'
33
import { ItemId, AddNewItem } from '@open-keystone/app-admin-ui/components'
44
import React, { useCallback } from 'react'
55
import { useLocation } from 'react-router-dom'
66

7-
import { Download } from '@open-condo/icons'
7+
import { Download, Link } from '@open-condo/icons'
88
import { getClientSideSenderInfo } from '@open-condo/miniapp-utils/helpers/sender'
9+
import { Button, Space } from '@open-condo/ui'
10+
import '@open-condo/ui/dist/styles.min.css'
911

1012
const TARGET_URL_PART = 'addresses'
1113

12-
const ICON_STYLE = {
13-
cursor: 'pointer',
14-
marginLeft: '20px',
15-
}
14+
const GET_ADDRESS_QUERY = gql`
15+
query getAddress ($id: ID!) {
16+
address: Address(where: { id: $id }) { id possibleDuplicateOf { id address } }
17+
}
18+
`
19+
20+
const RESOLVE_ADDRESS_DUPLICATE_MUTATION = gql`
21+
mutation resolveAddressDuplicate ($data: ResolveAddressDuplicateInput!) {
22+
result: resolveAddressDuplicate(data: $data) { status }
23+
}
24+
`
1625

1726
const ACTUALIZE_ADDRESSES_MUTATION = gql`
1827
mutation actualizeAddresses ($data: ActualizeAddressesInput!) {
1928
result: actualizeAddresses(data: $data) { successIds failures { addressId errorMessage } }
2029
}
2130
`
2231

32+
function extractErrorMessage (error, fallbackMessage) {
33+
return error?.graphQLErrors?.[0]?.extensions?.message ||
34+
error?.graphQLErrors?.[0]?.message ||
35+
error?.networkError?.message ||
36+
error?.message ||
37+
fallbackMessage
38+
}
39+
2340
const UpdateAddress = (props) => {
2441
const location = useLocation()
2542
const [actualizeAddress] = useMutation(ACTUALIZE_ADDRESSES_MUTATION)
2643
const onClick = useCallback(() => {
44+
// eslint-disable-next-line no-restricted-globals
45+
const confirmed = confirm(
46+
'Actualize this address?\n\n' +
47+
'This will re-fetch the address data from the suggestion provider, ' +
48+
'update the address key, meta, and heuristics, and reload the page.'
49+
)
50+
if (!confirmed) return
51+
2752
const sender = getClientSideSenderInfo()
2853
const path = location.pathname.split('/').splice(2, 2)
2954
const addressId = (path[0] === TARGET_URL_PART && path[1]) ? path[1] : null
55+
if (!addressId) {
56+
alert('Cannot detect address id from current URL')
57+
return
58+
}
59+
3060
const data = { dv: 1, sender, addresses: [{ id: addressId }] }
3161
actualizeAddress({ variables: { data } })
3262
.then(({ data }) => {
33-
const { result: { successIds, failures } } = data
63+
const result = data?.result
64+
if (!result) {
65+
alert('Actualize failed: empty response')
66+
return
67+
}
68+
69+
const { successIds, failures } = result
3470
if (successIds.includes(addressId)) {
3571
console.log('✅ Address actualized')
3672
window.location.reload()
@@ -46,13 +82,80 @@ const UpdateAddress = (props) => {
4682
})
4783
.catch((error) => {
4884
console.error('Failed to actualize address', error)
85+
alert(extractErrorMessage(error, 'Failed to actualize address'))
4986
})
5087
}, [location, actualizeAddress])
5188

5289
return location.pathname.indexOf(`${TARGET_URL_PART}/`) !== -1 && (
53-
<span style={ICON_STYLE} onClick={onClick}>
54-
<Download/>
55-
</span>
90+
<Button size='small' onClick={onClick} title='Actualize address'>
91+
<Download size='small' />
92+
</Button>
93+
)
94+
}
95+
96+
const ResolveAddressDuplicate = () => {
97+
const location = useLocation()
98+
const path = location.pathname.split('/').splice(2, 2)
99+
const addressId = (path[0] === TARGET_URL_PART && path[1]) ? path[1] : null
100+
101+
const { data } = useQuery(GET_ADDRESS_QUERY, {
102+
variables: { id: addressId },
103+
skip: !addressId,
104+
})
105+
106+
const [resolveDuplicate] = useMutation(RESOLVE_ADDRESS_DUPLICATE_MUTATION)
107+
108+
const possibleDuplicate = data?.address?.possibleDuplicateOf
109+
110+
const onClick = useCallback(() => {
111+
if (!possibleDuplicate) return
112+
113+
// eslint-disable-next-line no-restricted-globals
114+
const choice = prompt(
115+
`This address is a possible duplicate of:\n${possibleDuplicate.address} (${possibleDuplicate.id})\n\n` +
116+
'Type "merge" to MERGE (this address will be removed, all sources moved to the existing one)\n' +
117+
'Type "dismiss" to DISMISS (mark as not a duplicate, possibleDuplicateOf will be cleared)\n\n' +
118+
'Leave empty or press Cancel to abort.'
119+
)
120+
121+
if (!choice) return
122+
123+
const action = choice.trim().toLowerCase()
124+
if (action !== 'merge' && action !== 'dismiss') {
125+
alert(`Unknown action: "${choice}". Please type "merge" or "dismiss".`)
126+
return
127+
}
128+
129+
const sender = getClientSideSenderInfo()
130+
const mutationData = {
131+
dv: 1,
132+
sender,
133+
addressId,
134+
action,
135+
...(action === 'merge' ? { winnerId: possibleDuplicate.id } : {}),
136+
}
137+
138+
resolveDuplicate({ variables: { data: mutationData } })
139+
.then(({ data: result }) => {
140+
console.log(`Duplicate ${result.result.status}`)
141+
if (action === 'merge') {
142+
window.location.href = location.pathname.replace(addressId, possibleDuplicate.id)
143+
} else {
144+
window.location.reload()
145+
}
146+
})
147+
.catch((error) => {
148+
console.error('Failed to resolve duplicate', error)
149+
alert(extractErrorMessage(error, 'Failed to resolve duplicate'))
150+
})
151+
}, [addressId, location.pathname, possibleDuplicate, resolveDuplicate])
152+
153+
if (!possibleDuplicate) return null
154+
155+
return (
156+
<Button size='small' onClick={onClick} title='Resolve duplicate'>
157+
<Link size='small' />
158+
</Button>
56159
)
57160
}
58161

@@ -64,11 +167,16 @@ export default {
64167
},
65168
itemHeaderActions: () => {
66169
return (
67-
<div>
68-
<ItemId/>
69-
<AddNewItem/>
70-
<UpdateAddress/>
71-
</div>
170+
<Space direction='vertical' align='end'>
171+
<Space>
172+
<ItemId />
173+
<AddNewItem />
174+
</Space>
175+
<Space>
176+
<UpdateAddress />
177+
<ResolveAddressDuplicate />
178+
</Space>
179+
</Space>
72180
)
73181
},
74182
}

0 commit comments

Comments
 (0)