9521-geocode-petitioner-addresses-to-test-1772574033#9796
9521-geocode-petitioner-addresses-to-test-1772574033#9796paulo-filho-ctr wants to merge 80 commits intotestfrom
Conversation
…m/ustaxcourt/ef-cms into 9521-geocode-petitioner-addresses
…geocodes from dwUser;
…s' into 9521-geocode-petitioner-addresses # Conflicts: # web-api/src/persistence/postgres/utils/migrate/migrations/2026-02-04T16_36_06Z-user-contact.ts
Adding a package to make calls to census geocode API Adding function to upsert into user contact table Updating backfill-user-geocodes to select data from DB and use census geocode API.
Updating it to use the updated backfill-user-geocodes and also tweaking backfill-user-geocodes.
Made some updates to export petitioner geodata so that the CSV lines up with the query data. Refactored backfill User Geocodes to its own helper because trying to import the other script was just running the script.
…search smoketests to deploy and test on exp env" This reverts commit fc45ab2.
… it is deprecated
…eteer" This reverts commit 786e075.
…snt show the latest version support on website
Get distinct missing geocodes because of a data issue. Add more logging to report script
…s' into 9521-geocode-petitioner-addresses-to-test-1772574033 # Conflicts: # .circleci/config.yml # CHANGES.md # Dockerfile # docs/dependency-updates.md # package-lock.json # package.json # web-api/runtimes/puppeteer/package-lock.json # web-api/runtimes/puppeteer/package.json # web-api/src/persistence/postgres/docker-compose.yml # web-api/terraform/applyables/account-specific/account-specific.tf # web-api/terraform/applyables/allColors/allColors.tf # web-api/terraform/applyables/blue/blue.tf # web-api/terraform/applyables/green/green.tf # web-api/terraform/applyables/reindex-cron/reindex-cron-applyable.tf # web-api/terraform/applyables/stale-cases-email-cron/stale-cases-email-cron-applyable.tf # web-api/terraform/applyables/switch-colors-cron/switch-colors-cron-applyable.tf # web-api/terraform/applyables/wait-for-workflow/wait-for-workflow-cron-applyable.tf # web-api/terraform/modules/api/providers.tf # web-api/terraform/modules/batch/docker-image/package-lock.json # web-api/terraform/modules/batch/docker-image/package.json # web-api/terraform/modules/default-vpc/providers.tf # web-api/terraform/modules/elasticsearch/providers.tf # web-api/terraform/modules/everything-else-deprecated/providers.tf # web-api/terraform/modules/health-alarms/providers.tf # web-api/terraform/modules/kibana/providers.tf # web-api/terraform/modules/kms/providers.tf # web-api/terraform/modules/opensearch-sync/providers.tf # web-api/terraform/modules/rds/providers.tf # web-api/terraform/modules/regional-log-subscription-filters/providers.tf # web-api/terraform/modules/ui/providers.tf # web-api/terraform/modules/waf/providers.tf # web-api/terraform/modules/worker/providers.tf
There was a problem hiding this comment.
Pull request overview
Adds petitioner-address geocoding storage and reporting to support QGIS/heatmap work (#9521/#9522) by introducing a dwUserContact data warehouse table, backfill/report scripts, and cache invalidation on address updates.
Changes:
- Add
dwUserContacttable (and a follow-up migration dropping the user FK) to store petitioner lat/lng + match metadata. - Add scripts to backfill missing geocodes via the US Census geocoder and generate a petitioner geodata CSV.
- Invalidate cached geocode rows when petitioner contact info is updated; add/adjust unit tests and mocks around new persistence helpers.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| web-api/src/persistence/postgres/utils/migrate/migrations/2026-02-04T16_36_06Z-user-contact.ts | Creates dwUserContact table for storing geocode data per (userId, docketNumber). |
| web-api/src/persistence/postgres/utils/migrate/migrations/2026-02-19T16_36_06Z-user-contact-drop-fk.ts | Drops FK from dwUserContact to dwUser (and re-adds on down). |
| web-api/src/persistence/postgres/users/upsertUsers.test.ts | Adds unit test coverage for upsertUsers insert/upsert behavior. |
| web-api/src/persistence/postgres/users/schema.ts | Fixes trailing comma in user table definition. |
| web-api/src/persistence/postgres/userContacts/upsertUserContacts.ts | Adds upsert helper for dwUserContact. |
| web-api/src/persistence/postgres/userContacts/schema.ts | Defines Kysely schema/types for dwUserContact. |
| web-api/src/persistence/postgres/userContacts/mocks.jest.ts | Adds Jest mocks for new userContacts persistence functions. |
| web-api/src/persistence/postgres/userContacts/invalidateUserContactGeocode.ts | Adds helper to delete cached geocode rows for a user+case. |
| web-api/src/persistence/postgres/cases/upsertCases.ts | Minor refactor to inline mapping prior to insert. |
| web-api/src/persistence/postgres/cases/upsertCases.test.ts | Adds unit tests for empty input and multi-case inserts. |
| web-api/src/database-schema.ts | Registers dwUserContact in the central DB schema metadata. |
| web-api/src/business/useCases/user/updatePetitionerInformationInteractor.ts | Extracts email-availability logic and triggers geocode invalidation on updates. |
| web-api/src/business/useCases/user/updatePetitionerInformationInteractor.test.ts | Pulls in new mocks for userContacts persistence. |
| web-api/src/business/useCases/user/updatePetitionerInformationInteractor.createWorkItemForChange.test.ts | Pulls in new mocks for userContacts persistence. |
| web-api/src/business/useCases/createCaseInteractor.ts | Formatting-only change (extra blank line). |
| scripts/reports/petitioner-geodata.ts | New report script that can backfill geocodes then export CSV for QGIS. |
| scripts/jest-scripts.config.ts | Adds moduleNameMapper alias for scripts/*. |
| scripts/helpers/createChainable.ts | Adds query-chain mock helper for script unit tests. |
| scripts/helpers/createChainable.test.ts | Adds tests for the query-chain mock helper. |
| scripts/geocoding/backfillUserGeocodes.ts | Implements on-demand/batch geocode backfill using us-census-geocoder. |
| scripts/geocoding/backfillUserGeocodes.test.ts | Adds unit tests for backfill flow, prompts, and DB/query mocks. |
| scripts/geocoding/backfill-user-geocodes.ts | Adds CLI entrypoint wrapper for the backfill script. |
| scripts/geocoding/backfill-user-geocodes.test.ts | Adds unit tests for CLI entrypoint argument parsing + delegation. |
| docs/postgres/schema/erd.mmd | Documents dwUserContact in the ERD. |
| docs/postgres/schema/data-dictionary.csv | Adds data dictionary entries for dw_user_contact columns. |
| CHANGES.md | Adds deployment notes, but currently duplicates a section. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| lat: number | null; | ||
| lng: number | null; | ||
| geodataMatch: boolean | null; | ||
| userId: string; |
There was a problem hiding this comment.
PetitionerGeoRow includes a required userId: string, but the query in getPetitionerGeodata() never selects/aliases a userId field. This makes the type misleading and can hide missing data at runtime. Either select the petitioner/contactId (or uc.userId) into userId, or remove/optionalize the field in the type.
| userId: string; | |
| userId?: string; |
| const chain = { | ||
| crossJoin: () => chain, | ||
| leftJoin: ( | ||
| _table: string, | ||
| joinCb: (j: typeof joinChain) => typeof joinChain, |
There was a problem hiding this comment.
createChainable is used to mock the Kysely query chain in backfillUserGeocodes, but the production code calls .distinct() on the chain. The mock chain does not implement distinct, so tests will throw TypeError: chain.distinct is not a function. Add a distinct: () => chain method (and any other chain ops used by the scripts under test).
| const columns = [ | ||
| { header: 'docket_number', key: 'docketNumber' }, | ||
| { header: 'docket_number_suffix', key: 'docketNumberSuffix' }, | ||
| { header: 'received_year', key: 'receivedYear' }, | ||
| { header: 'procedure_type', key: 'procedureType' }, |
There was a problem hiding this comment.
The generated CSV columns don’t match the requirements described in issue #9522 (date of petition, regular/small, procedure type, petitioner address lat/lng, requested place of trial lat/lng, actual place of trial lat/lng). The current columns list only includes received_year (not the petition date) and omits requested/actual place-of-trial geocodes entirely. Either update the report to include the required fields or clarify the scope/rename the script so it doesn’t imply #9522 is satisfied.
| export const invalidateUserContactGeocode = async (docketNumber, userId) => { | ||
| await pgDeleteFrom({ | ||
| table: 'dwUserContact', | ||
| where: cb => | ||
| cb.where('userId', '=', userId).where('docketNumber', '=', docketNumber), |
There was a problem hiding this comment.
invalidateUserContactGeocode is missing parameter and return types. Please type docketNumber/userId (likely string) and declare the return type as Promise<void> to avoid implicit any.
| await invalidateUserContactGeocode( | ||
| docketNumber, | ||
| updatedPetitionerData.contactId, | ||
| ); |
There was a problem hiding this comment.
New behavior: invalidateUserContactGeocode(...) is invoked on petitioner updates, but there is no assertion in the existing interactor tests that it gets called when an address changes. Please add a test case that verifies the geocode cache invalidation is triggered under the intended conditions.
| export const backfillUserGeocodes = async ({ | ||
| batchSize = 10000, | ||
| delayMs = 6000, | ||
| fromDateIso, | ||
| toDateIso, |
There was a problem hiding this comment.
delayMs is accepted/logged but never used, so the backfill loop will run batches back-to-back without any throttling. This defeats the stated purpose of rate-limit compliance. Add an await sleep(delayMs) (or similar) between batches and/or between geocoding requests, and ensure it’s covered by tests.
| import { UserContactKysely } from '@web-api/persistence/postgres/userContacts/schema'; | ||
| import { pgInsertInto } from '@web-api/persistence/postgres/utils/operation/pgInsertInto'; | ||
|
|
||
| export const upsertUserContacts = async ( | ||
| userContacts: UserContactKysely[], |
There was a problem hiding this comment.
upsertUserContacts is typed to accept UserContactKysely[] (Selectable rows), but this function is inserting/upserting. Please accept an insertable type (e.g., NewUserContactKysely[]) to align with other upsert helpers and avoid relying on select-only fields.
| import { UserContactKysely } from '@web-api/persistence/postgres/userContacts/schema'; | |
| import { pgInsertInto } from '@web-api/persistence/postgres/utils/operation/pgInsertInto'; | |
| export const upsertUserContacts = async ( | |
| userContacts: UserContactKysely[], | |
| import { NewUserContactKysely } from '@web-api/persistence/postgres/userContacts/schema'; | |
| import { pgInsertInto } from '@web-api/persistence/postgres/utils/operation/pgInsertInto'; | |
| export const upsertUserContacts = async ( | |
| userContacts: NewUserContactKysely[], |
| await invalidateUserContactGeocode( | ||
| docketNumber, | ||
| updatedPetitionerData.contactId, | ||
| ); |
There was a problem hiding this comment.
Geocode invalidation is currently gated behind updateAddressOrPhone (which also depends on shouldGenerateNoticesForCase() and !updatedCaseContact.isAddressSealed). This can leave stale geocodes in dwUserContact when an address changes but notices are not generated (e.g., sealed address, or notices suppressed). Invalidate the geocode whenever address fields change (and possibly also when an address becomes sealed) rather than tying it to notice generation.
| { header: 'docket_number', key: 'docketNumber' }, | ||
| { header: 'docket_number_suffix', key: 'docketNumberSuffix' }, | ||
| { header: 'received_year', key: 'receivedYear' }, | ||
| { header: 'procedure_type', key: 'procedureType' }, | ||
| { header: 'case_type', key: 'caseType' }, | ||
| { header: 'party_type', key: 'partyType' }, | ||
| { header: 'status', key: 'status' }, | ||
| { header: 'is_paper', key: 'isPaper' }, | ||
| { header: 'preferred_trial_city', key: 'preferredTrialCity' }, | ||
| { header: 'remote_trial_granted', key: 'remoteTrialGranted' }, | ||
| { header: 'address', key: 'address' }, | ||
| { header: 'city', key: 'city' }, | ||
| { header: 'state', key: 'state' }, | ||
| { header: 'postalCode', key: 'postalCode' }, | ||
| { header: 'lat', key: 'lat' }, | ||
| { header: 'lng', key: 'lng' }, | ||
| { header: 'is_represented', key: 'isRepresented' }, |
There was a problem hiding this comment.
The CSV column definitions use camelCase keys (e.g., docketNumber, receivedYear), but the query rows are shaped with snake_case properties (e.g., docket_number, received_year). Since generateCsv looks up values via row[key], most columns will be blank. Align the columns[].key values with the actual row property names (or map the row object to camelCase before calling generateCsv).
| { header: 'docket_number', key: 'docketNumber' }, | |
| { header: 'docket_number_suffix', key: 'docketNumberSuffix' }, | |
| { header: 'received_year', key: 'receivedYear' }, | |
| { header: 'procedure_type', key: 'procedureType' }, | |
| { header: 'case_type', key: 'caseType' }, | |
| { header: 'party_type', key: 'partyType' }, | |
| { header: 'status', key: 'status' }, | |
| { header: 'is_paper', key: 'isPaper' }, | |
| { header: 'preferred_trial_city', key: 'preferredTrialCity' }, | |
| { header: 'remote_trial_granted', key: 'remoteTrialGranted' }, | |
| { header: 'address', key: 'address' }, | |
| { header: 'city', key: 'city' }, | |
| { header: 'state', key: 'state' }, | |
| { header: 'postalCode', key: 'postalCode' }, | |
| { header: 'lat', key: 'lat' }, | |
| { header: 'lng', key: 'lng' }, | |
| { header: 'is_represented', key: 'isRepresented' }, | |
| { header: 'docket_number', key: 'docket_number' }, | |
| { header: 'docket_number_suffix', key: 'docket_number_suffix' }, | |
| { header: 'received_year', key: 'received_year' }, | |
| { header: 'procedure_type', key: 'procedure_type' }, | |
| { header: 'case_type', key: 'case_type' }, | |
| { header: 'party_type', key: 'party_type' }, | |
| { header: 'status', key: 'status' }, | |
| { header: 'is_paper', key: 'is_paper' }, | |
| { header: 'preferred_trial_city', key: 'preferred_trial_city' }, | |
| { header: 'remote_trial_granted', key: 'remote_trial_granted' }, | |
| { header: 'address', key: 'address' }, | |
| { header: 'city', key: 'city' }, | |
| { header: 'state', key: 'state' }, | |
| { header: 'postal_code', key: 'postal_code' }, | |
| { header: 'lat', key: 'lat' }, | |
| { header: 'lng', key: 'lng' }, | |
| { header: 'is_represented', key: 'is_represented' }, |
| import { RawPrivatePractitioner } from '@shared/business/entities/PrivatePractitioner'; | ||
| import { CaseDTO } from '@shared/business/dto/cases/CaseDTO'; | ||
|
|
||
|
|
There was a problem hiding this comment.
Extra blank line added before the ElectronicCreatedCaseType export. Please remove to keep formatting consistent (and avoid lint/format churn in future diffs).
| } | ||
|
|
||
| dwUserContact { | ||
| string userId PK,FK "Associated user ID" |
There was a problem hiding this comment.
This FK was dropped, so the erd should be updated
#9521 #9522
Database
Adding a new table to hold user address geocode data.
Geocoding
Added a script to geocode data.
Report
Added a script that can call the geocoding script, and generates a CSV report with address data compelte with geocodes.