FIX: XML Entity file handling#701
FIX: XML Entity file handling#701carlwilson wants to merge 2 commits intoopenpreserve:integrationfrom
Conversation
- moved the `Jhove` and `JhoveView` classes from default packageas Java 11 tooling complained; - updated `jhove-apps` POM to add new `Jhove` class package to manifest line; - updated windows batch and linux shell start scripts to add new `Jhove` class package to manifest line; and - updated Maven dependencies for `izpack -> 5.1.3` and `jacoco -> 0.8.7`for Java 11 compatibility.
- fixed misfiring entity processor; and - fixed package warnings.
|
Observation, if you're going to use an automated code formatter, it would make sense to run it on the entire project to create a baseline. This would allow for cleaner PRs and ease the burden of resolving spurious merge conflicts. |
| ent = new InputSource(conn.getInputStream()); | ||
| } | ||
|
|
||
| ent = new InputSource(obj.openStream()); |
There was a problem hiding this comment.
This change here means that the code will no longer follow cross-protocol redirects (eg http->https). By default, HttpUrlConnection does follow redirects. However, it does not follow redirects across protocols.
|
I've spent a fair bit of time looking at this function the past couple of days. Here is the modified implementation that I arrived at: https://github.com/UW-Madison-Library/jhove/blob/48d083b2394fc8b9426d9567744eff519284b830/jhove-modules/xml-hul/src/main/java/edu/harvard/hul/ois/jhove/module/xml/XmlModuleHandler.java#L332 Things to note:
|
|
Hi Peter, thanks for your super helpful review. This all sounds a little familiar, since I did a review of the XML module code a year or two ago, but can barely remember the details now. I fixed a few bugs, and did a lot of house keeping. The PR (#634) is still awaiting integration, but you might find it of interest, if only to prevent duplicating efforts in places. A fix for this issue which still retains the old redirection logic was included in commit d223a28 of that PR. I'm not sure if it addresses all of your concerns, since it's been too long for me to remember, but let me know if it doesn't, and maybe I can amend that older PR. |
|
@david-russo Yeah, it looks like your old PR does address the same issue as is addressed in this PR. It does still have the bug when resolving relative entities but that's a different problem. I haven't looked at your PR in detail, but it looks like there's some good stuff in there, including a few changes that I have also made. It would be really nice if PRs didn't languish here for years, forcing us to duplicate each other's work and creating an ever growing merge problem for PR authors if/when PRs ever start to be merged again. |
|
Cheers. Does this comment relate to the relative entity resolution issue you're describing? If so, from what I remember/wrote, it sounds like there's a hacky solution in the module at the moment, but a proper fix may require a change to what information JHOVE passes its modules. |
|
Yes, but it can be addressed in some (perhaps not all) cases, simply by setting the system id on the returned InputSource. See here I'll create a ticket for it with a demonstration of the problem later today. [Edit] That is for entities that are relative a parent. This fix has no affect on relative entities without parents. |
Fixes #700 output is now a more informative: