-
Notifications
You must be signed in to change notification settings - Fork 1.4k
7281 mdm bulk export expansion #7283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This Pull Request has failed the formatting check Please run You can automate this auto-formatting process to execute on the git pre-push hook, by installing pre-commit and then calling |
hapi-fhir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/dao/BaseHapiFhirResourceDao.java
Show resolved
Hide resolved
...ge-batch2-jobs/src/main/java/ca/uhn/fhir/batch2/jobs/export/svc/BulkExportIdFetchingSvc.java
Show resolved
Hide resolved
hapi-fhir-jpaserver-test-r4/src/test/java/ca/uhn/fhir/jpa/bulk/BulkExportUseCaseTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only got about halfway through, but with what I've seen, I question the approach, due to:
- New version of the bulk export job.
- Unclear expansion semantics
- Change in behaviour of group export (I'm pretty sure, at least).
Probably easier to have a working session to discuss this. I may be wrong about certain aspects, but I've got enough concerns about the approach that it warrants a call.
...hir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/bulk/export/svc/JpaBulkExportProcessor.java
Outdated
Show resolved
Hide resolved
// of fetching mdm linked patients as well as converting all of them to | ||
// JpaPid | ||
Set<JpaPid> resolvedAndMdmExpanded = myMdmExpandersHolder | ||
.getBulkExportMDMResourceExpanderInstance() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: how is bulk export MDM pid expansion different than normal mdm pid expansion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the issue Luis found.
Depending on whose code merges first, it'll likely be updated there.
Internally, it's doing some check it shouldn't do. But this code here was just refactored from another place so i didn't delve into what it was doing internally.
|
||
// use those maps to get the patient ids we care about | ||
List<JpaPid> pids = | ||
getPatientPidsUsingSearchMaps(maps, theParams.getGroupId(), null, theParams.getRequestPartitionId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I'm confused here. A group export exports for all patients in the group. Why would not all patients in the group be MDM-expanded? Don't we care about all the patient IDs?
...hir-jpaserver-base/src/main/java/ca/uhn/fhir/jpa/bulk/export/svc/JpaBulkExportProcessor.java
Outdated
Show resolved
Hide resolved
.map(pid -> (JpaPid) pid) | ||
.toList(); | ||
return new LinkedHashSet<>(existingMembers); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I don't understand the purpose of this method. THe first thing it does is check if its already expanded? Why do it at all? The caller should know if expansion has occurred or not. Are we conflating mdm expansion with group expansion? May be worthwhile to once-over the var names to disambiguate the word.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kinda where we get into trouble with having shared logic.
the "expanded patient ids" are for v3
V2 will get here and not have this. so they will continue to do the expansion just as before (per resource).
This is because "FetchIds" happens in differnet places.
The only other solution would be to manually put a parameter in that states what version it is and use that or just copy paste the code all over htep lace.
I'm ok with eitehr ,if you'd prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are different versions of the job, we should fork the code. It is unsafe for us to think we can be backwards compatible by being tricky like this.
public void addExpandedPatientId(PatientIdAndPidJson theId) { | ||
getExpandedPatientIds().add(theId); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this needs to exist. Why not move the responsibility of expansion way out of the job itself, and have it expand before job initiation, and the job can just remain how it is?
pid.setAssociatedResourceId(theFhirContext.getVersion().newIdType(getResourceId())); | ||
return pid; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question; Have you checked for existence of a similar class? This is a whole data class that just represents a tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually just because i needed the real resourceid as well as what's in the pid.
We store these objects into the db in batch jobs. and i need the real resource id to be serializable.
I could've added the value to TypedPidJson. But that object is used in other places and putting this value on the base class would (imo) be confusing since users in other areas might actually expect it.
The other option is to make a completely new object that is like TypePidJson but has the actual resource id, but.... this feels 'wrong' somehow since it's duplicating a lot of what TypePidJson is doing already.
And TypePidJson is already being used in over 100 places.
(Another down side of all the shared logic between BulkExport, Reindex, and BulkModify jobs)
...-storage-batch2-jobs/src/main/java/ca/uhn/fhir/batch2/jobs/export/models/ResourceIdList.java
Show resolved
Hide resolved
|
||
return submissionCount; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (blocking) : this class seems to have extracted functionality just to support reuse between v2 and v3 of this job. Also, fetchIds
does wayyyyyy more than fetch IDs. the ownership of consumption has moved out of the step, and into this Service, which receives a consumer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did just copy-paste this code into a service because i basically wanted "the exact same logic"
But you're right - it might make sense to actually copy-paste the code (and leave it in the old step) so that the job itself has unique code unshared with previous versions
"Expand out patient ids if necessary", | ||
MdmExpandedPatientIds.class, | ||
mdmExpansionStep()) | ||
// load in (all) ids and create id chunks of 1000 each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (repeated): why not just perform an expansion before the job starts, and run a patient/group export with the pre-expanded list of IDs, on the existing job def?
closes #7281