7281 mdm bulk export expansion #7283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

TipzCM wants to merge 13 commits into master from 7281-mdm-bulk-export-expansion

Collaborator

TipzCM commented Oct 1, 2025

closes #7281

leifstawnyczy added 3 commits

October 1, 2025 11:02


          basic workin code

2462ec9


          added workingcode

56d44c0


          spotless

9ebe21b

TipzCM requested review from michaelabuckley, jamesagnew, tadgh, fil512 and AD1306 as code owners

October 1, 2025 20:23

Contributor

robogary commented Oct 1, 2025 •

edited

Loading

This Pull Request has failed the formatting check

Please run mvn spotless:apply or mvn clean install -DskipTests to fix the formatting issues.

You can automate this auto-formatting process to execute on the git pre-push hook, by installing pre-commit and then calling pre-commit install --hook-type pre-push. This will cause formatting to run automatically whenever you push.

TipzCM commented

View reviewed changes

hapi-fhir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/dao/BaseHapiFhirResourceDao.java Show resolved Hide resolved

TipzCM commented

View reviewed changes

...ge-batch2-jobs/src/main/java/ca/uhn/fhir/batch2/jobs/export/svc/BulkExportIdFetchingSvc.java Show resolved Hide resolved

leifstawnyczy added 6 commits

October 2, 2025 09:54


          updating tests

c11a7b7


          fixing a bug

c7b3aff


          spotless

e9cbe01


          changelog

c0b657e


          Merge branch 'master' into 7281-mdm-bulk-export-expansion

95a899c


          spotless

fd73b19

YalingPeiS reviewed

View reviewed changes

hapi-fhir-jpaserver-test-r4/src/test/java/ca/uhn/fhir/jpa/bulk/BulkExportUseCaseTest.java Outdated Show resolved Hide resolved

leifstawnyczy added 3 commits

October 6, 2025 13:27


          fixing merge conflict

dc54459


          Merge branch 'master' into 7281-mdm-bulk-export-expansion

4c13083


          merging in master

a9e4dd9

tadgh requested changes

View reviewed changes

Collaborator

tadgh left a comment

I only got about halfway through, but with what I've seen, I question the approach, due to:

New version of the bulk export job.
Unclear expansion semantics
Change in behaviour of group export (I'm pretty sure, at least).

Probably easier to have a working session to discuss this. I may be wrong about certain aspects, but I've got enough concerns about the approach that it warrants a call.

...hir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/bulk/export/svc/JpaBulkExportProcessor.java Outdated Show resolved Hide resolved

...hir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/bulk/export/svc/JpaBulkExportProcessor.java

    
              			// of fetching mdm linked patients as well as converting all of them to

              			// JpaPid

              			Set<JpaPid> resolvedAndMdmExpanded = myMdmExpandersHolder

              					.getBulkExportMDMResourceExpanderInstance()

Collaborator

tadgh Oct 8, 2025

question: how is bulk export MDM pid expansion different than normal mdm pid expansion?

Collaborator Author

TipzCM Oct 9, 2025

This is related to the issue Luis found.

Depending on whose code merges first, it'll likely be updated there.

Internally, it's doing some check it shouldn't do. But this code here was just refactored from another place so i didn't delve into what it was doing internally.

...hir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/bulk/export/svc/JpaBulkExportProcessor.java

    
              		// use those maps to get the patient ids we care about

              		List<JpaPid> pids =

              				getPatientPidsUsingSearchMaps(maps, theParams.getGroupId(), null, theParams.getRequestPartitionId());

Collaborator

tadgh Oct 8, 2025

thought: I'm confused here. A group export exports for all patients in the group. Why would not all patients in the group be MDM-expanded? Don't we care about all the patient IDs?

...hir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/bulk/export/svc/JpaBulkExportProcessor.java Outdated Show resolved Hide resolved

...hir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/bulk/export/svc/JpaBulkExportProcessor.java

    
              					.map(pid -> (JpaPid) pid)

              					.toList();

              			return new LinkedHashSet<>(existingMembers);

              		}

Collaborator

tadgh Oct 8, 2025

thought: I don't understand the purpose of this method. THe first thing it does is check if its already expanded? Why do it at all? The caller should know if expansion has occurred or not. Are we conflating mdm expansion with group expansion? May be worthwhile to once-over the var names to disambiguate the word.

Collaborator Author

TipzCM Oct 9, 2025

This is kinda where we get into trouble with having shared logic.

the "expanded patient ids" are for v3

V2 will get here and not have this. so they will continue to do the expansion just as before (per resource).

This is because "FetchIds" happens in differnet places.

The only other solution would be to manually put a parameter in that states what version it is and use that or just copy paste the code all over htep lace.

I'm ok with eitehr ,if you'd prefer

Contributor

michaelabuckley Oct 9, 2025 •

edited

Loading

If there are different versions of the job, we should fork the code. It is unsafe for us to think we can be backwards compatible by being tricky like this.

...e-batch2-jobs/src/main/java/ca/uhn/fhir/batch2/jobs/export/models/MdmExpandedPatientIds.java

    
              	public void addExpandedPatientId(PatientIdAndPidJson theId) {

              		getExpandedPatientIds().add(theId);

              	}

              }

Collaborator

tadgh Oct 8, 2025

i don't think this needs to exist. Why not move the responsibility of expansion way out of the job itself, and have it expand before job initiation, and the job can just remain how it is?

...age-batch2-jobs/src/main/java/ca/uhn/fhir/batch2/jobs/export/models/PatientIdAndPidJson.java

    
              		pid.setAssociatedResourceId(theFhirContext.getVersion().newIdType(getResourceId()));

              		return pid;

              	}

              }

Collaborator

tadgh Oct 8, 2025

question; Have you checked for existence of a similar class? This is a whole data class that just represents a tuple.

Collaborator Author

TipzCM Oct 9, 2025

It's actually just because i needed the real resourceid as well as what's in the pid.

We store these objects into the db in batch jobs. and i need the real resource id to be serializable.

I could've added the value to TypedPidJson. But that object is used in other places and putting this value on the base class would (imo) be confusing since users in other areas might actually expect it.

The other option is to make a completely new object that is like TypePidJson but has the actual resource id, but.... this feels 'wrong' somehow since it's duplicating a lot of what TypePidJson is doing already.

And TypePidJson is already being used in over 100 places.

(Another down side of all the shared logic between BulkExport, Reindex, and BulkModify jobs)

...-storage-batch2-jobs/src/main/java/ca/uhn/fhir/batch2/jobs/export/models/ResourceIdList.java Show resolved Hide resolved

...ge-batch2-jobs/src/main/java/ca/uhn/fhir/batch2/jobs/export/svc/BulkExportIdFetchingSvc.java

    
              		return submissionCount;

              	}

              }

Collaborator

tadgh Oct 8, 2025

issue (blocking) : this class seems to have extracted functionality just to support reuse between v2 and v3 of this job. Also, fetchIds does wayyyyyy more than fetch IDs. the ownership of consumption has moved out of the step, and into this Service, which receives a consumer.

Collaborator Author

TipzCM Oct 9, 2025

I did just copy-paste this code into a service because i basically wanted "the exact same logic"

But you're right - it might make sense to actually copy-paste the code (and leave it in the old step) so that the job itself has unique code unshared with previous versions

...-storage-batch2-jobs/src/main/java/ca/uhn/fhir/batch2/jobs/export/v3/BulkExportV3Config.java

    
              						"Expand out patient ids if necessary",

              						MdmExpandedPatientIds.class,

              						mdmExpansionStep())

              				// load in (all) ids and create id chunks of 1000 each

Collaborator

tadgh Oct 8, 2025

question (repeated): why not just perform an expansion before the job starts, and run a patient/group export with the pre-expanded list of IDs, on the existing job def?


          review points round 1

6a42ecb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

YalingPeiS YalingPeiS left review comments

tadgh tadgh requested changes

michaelabuckley Awaiting requested review from michaelabuckley michaelabuckley is a code owner

jamesagnew Awaiting requested review from jamesagnew jamesagnew is a code owner

fil512 Awaiting requested review from fil512 fil512 is a code owner

AD1306 Awaiting requested review from AD1306 AD1306 is a code owner

Requested changes must be addressed to merge this pull request.

Labels

None yet