Skip to content

9521-geocode-petitioner-addresses-to-test-1772574033#9796

Open
paulo-filho-ctr wants to merge 80 commits intotestfrom
9521-geocode-petitioner-addresses-to-test-1772574033
Open

9521-geocode-petitioner-addresses-to-test-1772574033#9796
paulo-filho-ctr wants to merge 80 commits intotestfrom
9521-geocode-petitioner-addresses-to-test-1772574033

Conversation

@paulo-filho-ctr
Copy link
Contributor

#9521 #9522

Database

Adding a new table to hold user address geocode data.

  • Table named dw_user_contact.
  • Currently, it only holds PK and geocode data
  • Each record corresponds to a petitioner in the dw_case petitioners column. So our PK is a docket number and a user ID.

Geocoding

Added a script to geocode data.

  • We ended up taking an on demand approach for geocoding user addresses as we need them, then storing them in our DB for future uses. If any addresses are updated they'll be removed from the table. Like invalidating records in a cache.
  • We used an API provided by the US Census Beureu, and an NPM package to streamline the calls to this API.
      • If we want to avoid using this package then the alternative is to create CSV payloads and directly call the API ourselves.
  • During manual testing it would take between 1 to 3 minutes to geocode 10,000 records. To geocode all of the addresses since DAWSONs release it will take around 18 calls to the geocoding API.
  • Accepts parameters to geocode cases that were recieved within a date range.

Report

Added a script that can call the geocoding script, and generates a CSV report with address data compelte with geocodes.

paulo-filho-ctr and others added 30 commits January 16, 2026 11:01
…s' into 9521-geocode-petitioner-addresses

# Conflicts:
#	web-api/src/persistence/postgres/utils/migrate/migrations/2026-02-04T16_36_06Z-user-contact.ts
Adding a package to make calls to census geocode API
Adding function to upsert into user contact table
Updating backfill-user-geocodes to select data from DB and use census geocode API.
Updating it to use the updated backfill-user-geocodes and also tweaking backfill-user-geocodes.
Made some updates to export petitioner geodata so that the CSV lines up with the query data.
Refactored backfill User Geocodes to its own helper because trying to import the other script was just running the script.
alexdziarn and others added 25 commits February 26, 2026 10:23
…search smoketests to deploy and test on exp env"

This reverts commit fc45ab2.
…snt show the latest version support on website
Get distinct missing geocodes because of a data issue.
Add more logging to report script
…s' into 9521-geocode-petitioner-addresses-to-test-1772574033

# Conflicts:
#	.circleci/config.yml
#	CHANGES.md
#	Dockerfile
#	docs/dependency-updates.md
#	package-lock.json
#	package.json
#	web-api/runtimes/puppeteer/package-lock.json
#	web-api/runtimes/puppeteer/package.json
#	web-api/src/persistence/postgres/docker-compose.yml
#	web-api/terraform/applyables/account-specific/account-specific.tf
#	web-api/terraform/applyables/allColors/allColors.tf
#	web-api/terraform/applyables/blue/blue.tf
#	web-api/terraform/applyables/green/green.tf
#	web-api/terraform/applyables/reindex-cron/reindex-cron-applyable.tf
#	web-api/terraform/applyables/stale-cases-email-cron/stale-cases-email-cron-applyable.tf
#	web-api/terraform/applyables/switch-colors-cron/switch-colors-cron-applyable.tf
#	web-api/terraform/applyables/wait-for-workflow/wait-for-workflow-cron-applyable.tf
#	web-api/terraform/modules/api/providers.tf
#	web-api/terraform/modules/batch/docker-image/package-lock.json
#	web-api/terraform/modules/batch/docker-image/package.json
#	web-api/terraform/modules/default-vpc/providers.tf
#	web-api/terraform/modules/elasticsearch/providers.tf
#	web-api/terraform/modules/everything-else-deprecated/providers.tf
#	web-api/terraform/modules/health-alarms/providers.tf
#	web-api/terraform/modules/kibana/providers.tf
#	web-api/terraform/modules/kms/providers.tf
#	web-api/terraform/modules/opensearch-sync/providers.tf
#	web-api/terraform/modules/rds/providers.tf
#	web-api/terraform/modules/regional-log-subscription-filters/providers.tf
#	web-api/terraform/modules/ui/providers.tf
#	web-api/terraform/modules/waf/providers.tf
#	web-api/terraform/modules/worker/providers.tf
Copilot AI review requested due to automatic review settings March 3, 2026 21:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds petitioner-address geocoding storage and reporting to support QGIS/heatmap work (#9521/#9522) by introducing a dwUserContact data warehouse table, backfill/report scripts, and cache invalidation on address updates.

Changes:

  • Add dwUserContact table (and a follow-up migration dropping the user FK) to store petitioner lat/lng + match metadata.
  • Add scripts to backfill missing geocodes via the US Census geocoder and generate a petitioner geodata CSV.
  • Invalidate cached geocode rows when petitioner contact info is updated; add/adjust unit tests and mocks around new persistence helpers.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
web-api/src/persistence/postgres/utils/migrate/migrations/2026-02-04T16_36_06Z-user-contact.ts Creates dwUserContact table for storing geocode data per (userId, docketNumber).
web-api/src/persistence/postgres/utils/migrate/migrations/2026-02-19T16_36_06Z-user-contact-drop-fk.ts Drops FK from dwUserContact to dwUser (and re-adds on down).
web-api/src/persistence/postgres/users/upsertUsers.test.ts Adds unit test coverage for upsertUsers insert/upsert behavior.
web-api/src/persistence/postgres/users/schema.ts Fixes trailing comma in user table definition.
web-api/src/persistence/postgres/userContacts/upsertUserContacts.ts Adds upsert helper for dwUserContact.
web-api/src/persistence/postgres/userContacts/schema.ts Defines Kysely schema/types for dwUserContact.
web-api/src/persistence/postgres/userContacts/mocks.jest.ts Adds Jest mocks for new userContacts persistence functions.
web-api/src/persistence/postgres/userContacts/invalidateUserContactGeocode.ts Adds helper to delete cached geocode rows for a user+case.
web-api/src/persistence/postgres/cases/upsertCases.ts Minor refactor to inline mapping prior to insert.
web-api/src/persistence/postgres/cases/upsertCases.test.ts Adds unit tests for empty input and multi-case inserts.
web-api/src/database-schema.ts Registers dwUserContact in the central DB schema metadata.
web-api/src/business/useCases/user/updatePetitionerInformationInteractor.ts Extracts email-availability logic and triggers geocode invalidation on updates.
web-api/src/business/useCases/user/updatePetitionerInformationInteractor.test.ts Pulls in new mocks for userContacts persistence.
web-api/src/business/useCases/user/updatePetitionerInformationInteractor.createWorkItemForChange.test.ts Pulls in new mocks for userContacts persistence.
web-api/src/business/useCases/createCaseInteractor.ts Formatting-only change (extra blank line).
scripts/reports/petitioner-geodata.ts New report script that can backfill geocodes then export CSV for QGIS.
scripts/jest-scripts.config.ts Adds moduleNameMapper alias for scripts/*.
scripts/helpers/createChainable.ts Adds query-chain mock helper for script unit tests.
scripts/helpers/createChainable.test.ts Adds tests for the query-chain mock helper.
scripts/geocoding/backfillUserGeocodes.ts Implements on-demand/batch geocode backfill using us-census-geocoder.
scripts/geocoding/backfillUserGeocodes.test.ts Adds unit tests for backfill flow, prompts, and DB/query mocks.
scripts/geocoding/backfill-user-geocodes.ts Adds CLI entrypoint wrapper for the backfill script.
scripts/geocoding/backfill-user-geocodes.test.ts Adds unit tests for CLI entrypoint argument parsing + delegation.
docs/postgres/schema/erd.mmd Documents dwUserContact in the ERD.
docs/postgres/schema/data-dictionary.csv Adds data dictionary entries for dw_user_contact columns.
CHANGES.md Adds deployment notes, but currently duplicates a section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lat: number | null;
lng: number | null;
geodataMatch: boolean | null;
userId: string;
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PetitionerGeoRow includes a required userId: string, but the query in getPetitionerGeodata() never selects/aliases a userId field. This makes the type misleading and can hide missing data at runtime. Either select the petitioner/contactId (or uc.userId) into userId, or remove/optionalize the field in the type.

Suggested change
userId: string;
userId?: string;

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +12
const chain = {
crossJoin: () => chain,
leftJoin: (
_table: string,
joinCb: (j: typeof joinChain) => typeof joinChain,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createChainable is used to mock the Kysely query chain in backfillUserGeocodes, but the production code calls .distinct() on the chain. The mock chain does not implement distinct, so tests will throw TypeError: chain.distinct is not a function. Add a distinct: () => chain method (and any other chain ops used by the scripts under test).

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +181
const columns = [
{ header: 'docket_number', key: 'docketNumber' },
{ header: 'docket_number_suffix', key: 'docketNumberSuffix' },
{ header: 'received_year', key: 'receivedYear' },
{ header: 'procedure_type', key: 'procedureType' },
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated CSV columns don’t match the requirements described in issue #9522 (date of petition, regular/small, procedure type, petitioner address lat/lng, requested place of trial lat/lng, actual place of trial lat/lng). The current columns list only includes received_year (not the petition date) and omits requested/actual place-of-trial geocodes entirely. Either update the report to include the required fields or clarify the scope/rename the script so it doesn’t imply #9522 is satisfied.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +7
export const invalidateUserContactGeocode = async (docketNumber, userId) => {
await pgDeleteFrom({
table: 'dwUserContact',
where: cb =>
cb.where('userId', '=', userId).where('docketNumber', '=', docketNumber),
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalidateUserContactGeocode is missing parameter and return types. Please type docketNumber/userId (likely string) and declare the return type as Promise<void> to avoid implicit any.

Copilot uses AI. Check for mistakes.
Comment on lines +311 to +314
await invalidateUserContactGeocode(
docketNumber,
updatedPetitionerData.contactId,
);
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior: invalidateUserContactGeocode(...) is invoked on petitioner updates, but there is no assertion in the existing interactor tests that it gets called when an address changes. Please add a test case that verifies the geocode cache invalidation is triggered under the intended conditions.

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +83
export const backfillUserGeocodes = async ({
batchSize = 10000,
delayMs = 6000,
fromDateIso,
toDateIso,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delayMs is accepted/logged but never used, so the backfill loop will run batches back-to-back without any throttling. This defeats the stated purpose of rate-limit compliance. Add an await sleep(delayMs) (or similar) between batches and/or between geocoding requests, and ensure it’s covered by tests.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
import { UserContactKysely } from '@web-api/persistence/postgres/userContacts/schema';
import { pgInsertInto } from '@web-api/persistence/postgres/utils/operation/pgInsertInto';

export const upsertUserContacts = async (
userContacts: UserContactKysely[],
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upsertUserContacts is typed to accept UserContactKysely[] (Selectable rows), but this function is inserting/upserting. Please accept an insertable type (e.g., NewUserContactKysely[]) to align with other upsert helpers and avoid relying on select-only fields.

Suggested change
import { UserContactKysely } from '@web-api/persistence/postgres/userContacts/schema';
import { pgInsertInto } from '@web-api/persistence/postgres/utils/operation/pgInsertInto';
export const upsertUserContacts = async (
userContacts: UserContactKysely[],
import { NewUserContactKysely } from '@web-api/persistence/postgres/userContacts/schema';
import { pgInsertInto } from '@web-api/persistence/postgres/utils/operation/pgInsertInto';
export const upsertUserContacts = async (
userContacts: NewUserContactKysely[],

Copilot uses AI. Check for mistakes.
Comment on lines +311 to +314
await invalidateUserContactGeocode(
docketNumber,
updatedPetitionerData.contactId,
);
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geocode invalidation is currently gated behind updateAddressOrPhone (which also depends on shouldGenerateNoticesForCase() and !updatedCaseContact.isAddressSealed). This can leave stale geocodes in dwUserContact when an address changes but notices are not generated (e.g., sealed address, or notices suppressed). Invalidate the geocode whenever address fields change (and possibly also when an address becomes sealed) rather than tying it to notice generation.

Copilot uses AI. Check for mistakes.
Comment on lines +178 to +194
{ header: 'docket_number', key: 'docketNumber' },
{ header: 'docket_number_suffix', key: 'docketNumberSuffix' },
{ header: 'received_year', key: 'receivedYear' },
{ header: 'procedure_type', key: 'procedureType' },
{ header: 'case_type', key: 'caseType' },
{ header: 'party_type', key: 'partyType' },
{ header: 'status', key: 'status' },
{ header: 'is_paper', key: 'isPaper' },
{ header: 'preferred_trial_city', key: 'preferredTrialCity' },
{ header: 'remote_trial_granted', key: 'remoteTrialGranted' },
{ header: 'address', key: 'address' },
{ header: 'city', key: 'city' },
{ header: 'state', key: 'state' },
{ header: 'postalCode', key: 'postalCode' },
{ header: 'lat', key: 'lat' },
{ header: 'lng', key: 'lng' },
{ header: 'is_represented', key: 'isRepresented' },
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV column definitions use camelCase keys (e.g., docketNumber, receivedYear), but the query rows are shaped with snake_case properties (e.g., docket_number, received_year). Since generateCsv looks up values via row[key], most columns will be blank. Align the columns[].key values with the actual row property names (or map the row object to camelCase before calling generateCsv).

Suggested change
{ header: 'docket_number', key: 'docketNumber' },
{ header: 'docket_number_suffix', key: 'docketNumberSuffix' },
{ header: 'received_year', key: 'receivedYear' },
{ header: 'procedure_type', key: 'procedureType' },
{ header: 'case_type', key: 'caseType' },
{ header: 'party_type', key: 'partyType' },
{ header: 'status', key: 'status' },
{ header: 'is_paper', key: 'isPaper' },
{ header: 'preferred_trial_city', key: 'preferredTrialCity' },
{ header: 'remote_trial_granted', key: 'remoteTrialGranted' },
{ header: 'address', key: 'address' },
{ header: 'city', key: 'city' },
{ header: 'state', key: 'state' },
{ header: 'postalCode', key: 'postalCode' },
{ header: 'lat', key: 'lat' },
{ header: 'lng', key: 'lng' },
{ header: 'is_represented', key: 'isRepresented' },
{ header: 'docket_number', key: 'docket_number' },
{ header: 'docket_number_suffix', key: 'docket_number_suffix' },
{ header: 'received_year', key: 'received_year' },
{ header: 'procedure_type', key: 'procedure_type' },
{ header: 'case_type', key: 'case_type' },
{ header: 'party_type', key: 'party_type' },
{ header: 'status', key: 'status' },
{ header: 'is_paper', key: 'is_paper' },
{ header: 'preferred_trial_city', key: 'preferred_trial_city' },
{ header: 'remote_trial_granted', key: 'remote_trial_granted' },
{ header: 'address', key: 'address' },
{ header: 'city', key: 'city' },
{ header: 'state', key: 'state' },
{ header: 'postal_code', key: 'postal_code' },
{ header: 'lat', key: 'lat' },
{ header: 'lng', key: 'lng' },
{ header: 'is_represented', key: 'is_represented' },

Copilot uses AI. Check for mistakes.
import { RawPrivatePractitioner } from '@shared/business/entities/PrivatePractitioner';
import { CaseDTO } from '@shared/business/dto/cases/CaseDTO';


Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line added before the ElectronicCreatedCaseType export. Please remove to keep formatting consistent (and avoid lint/format churn in future diffs).

Suggested change

Copilot uses AI. Check for mistakes.
}

dwUserContact {
string userId PK,FK "Associated user ID"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This FK was dropped, so the erd should be updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants