-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Fix hdfs-related IT tests for java24 #122044
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -41,18 +41,17 @@ public static void main(String[] args) throws Exception { | |||
| try (JarFile jarFile = new JarFile(new File(jarPath))) { | |||
| for (var patcher : patchers.entrySet()) { | |||
| JarEntry jarEntry = jarFile.getJarEntry(patcher.getKey()); | |||
| if (jarEntry == null) { | |||
| throw new IllegalArgumentException("path [" + patcher.getKey() + "] not found in [" + jarPath + "]"); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This had to go, as some classes are not present in both jars; it's unfortunate, but I think acceptable. If it's not, should we pass the classes to patch via the command line, so it is configurable in each gradle JavaExec task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should pass the files to patch down, it should be part of the configuration of the task. we can keep how to actually patch the classes here, but the list of classes to patch be passed in.
However, why would the same class be present in two different jars? That's jarhell, which we should be rejecting already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have said both projects; the hdsf-fixture projects needs a subset of the patches, as it pulls in a different (smaller) set of dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still missing something. Why do dependencies matter? We should only be patching a given class once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The classes we need to patch are in the 2 project's dependencies: for hadoop-client-api, they are in both hadoop-common and hadoop-auth; hdfs-fixture however depends only on hadoop-common (via hadoop-minicluster), so trying to patch classes in hadoop-auth result in an error (the class is not there).
But like you said, this can be solved by making the patcher get a list of the class names.
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let @mark-vieira comment on the gradle, but I suspect we may want this to be a gradle plugin so we don't eg have to duplicate the patchClasses task and setup.
| @@ -41,18 +41,17 @@ public static void main(String[] args) throws Exception { | |||
| try (JarFile jarFile = new JarFile(new File(jarPath))) { | |||
| for (var patcher : patchers.entrySet()) { | |||
| JarEntry jarEntry = jarFile.getJarEntry(patcher.getKey()); | |||
| if (jarEntry == null) { | |||
| throw new IllegalArgumentException("path [" + patcher.getKey() + "] not found in [" + jarPath + "]"); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should pass the files to patch down, it should be part of the configuration of the task. we can keep how to actually patch the classes here, but the list of classes to patch be passed in.
However, why would the same class be present in two different jars? That's jarhell, which we should be rejecting already?
That would be ideal I think. Maybe we can work on this together, so I learn a bit about gradle plugins! |
|
@ldematte This is more or less the perfect use case to use Gradle Artefact Transforms. The goal of those artefact transforms is to tweak third party dependencies before using them in your build. The benefits in general are IMO
I pushed a draft on how to port this PR to use artefact transforms to https://github.com/breskeby/elasticsearch/tree/fix-hdfs-tests-java24-artifact-transforms Happy to talk about this synchronously and help you bring that over the line. Parts of what's living in the build script in my example likely can be moved to a plugin. |
breskeby
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly prefer using gradle artefact transforms for these kind of dependency patching. Details at #122044 (comment)
|
Thanks Rene, I'll update it as we discussed and make it parametric, so we can have back the check that the expected classes are all patched. |
|
@breskeby What does the resulting artifact naming look like? One nice thing currently about the way we do this kind of patching is that we can make clear it's elastic-modified (not that we do change the name in this case currently, but in other cases eg with log4j we have tweaked the name). |
|
We can choose the naming in our implementation. In the example branch I just added "patched" to the input jar. So it would be "hadoop-common-2.8.7.jar" -> "hadoop-common-2.8.7-patched.jar" |
…arch into fix-hdfs-tests-java24
|
Pinging @elastic/es-delivery (Team:Delivery) |
|
I moved the patcher implementation to a transformer as @breskeby suggested, and added the ability to specify (via a parameter) which set of classes to patch for which jar/artifact. |
breskeby
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great and getting rid of hadoop-client-api subproject is a very nice sideeffect. I added a few nitpicks which would be great to be addressed before merging.
Otherwise LGTM.
| // Add the entry to the new JAR file | ||
| jos.putNextEntry(new JarEntry(entryName)); | ||
|
|
||
| Function<ClassWriter, ClassVisitor> classPatcher = jarPatchers.get(entryName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add a check here to verify all expected patches are applied and fail early if not to avoid having undetected changes sneaked into this again by e.g. updating hadoop versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do that and it's a great idea; however there is one big blocker for this: the hadoop-common artifact has one additional "classifier" named tests, which contains only tests of course but is brought in anyway.
I tried to exclude it but apparently it's not possible to do that in gradle (there is limited support for classifiers).
The thing is that hadoop-common is applied to both hadoop-common jars (regular and "tests"), and it will fail on the second, because none of the expected patches will be applied there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: if you have a good way to either exclude hadoop-common:tests or to recognize it and avoid trying to patch it I'm all ears!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could tweak the regex to match the file name being stricter in a sense of expected name + version regex instead of just a simple string.contains based check to fix to not match -tests artifacts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that; I discarded it because "2.8.5-tests" is a valid "version" (isn't it? like 1.0.0-beta).
I can allow only the numeric part; it's not super clean but..
I would have preferred excluding the jar completely, since it's only tests (no purpose having it), but seems that this is very hard to do in gradle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try and see the regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to deal with that test classifier stuff with hadoop before and suffered myself. okay. lets just live with that for now. we can revisit later if we feel the need. I will use parts of that stuff later on to fix other places in our codebase.
…he matched artifacts
…arch into fix-hdfs-tests-java24
| static final List<JarPatchers> allPatchers = List.of( | ||
| new JarPatchers( | ||
| "hadoop-common", | ||
| Pattern.compile("hadoop-common-(?!.*tests)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@breskeby I added regex matching to bring back the "have we applied all patches" check; let me know if you like it or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
💔 Backport failed
You can use sqren/backport to manually backport by running |
HDFS IT tests started failing with Java 24.
The problem is that some hadoop libraries (mainly
hdfs-common) contain a call to a SecurityManager related method that has been removed in JDK 24.A short term solution is to patch
hdfs-fixtureas we do forhadoop-client-api(same as we did in #119779)Fixes #121967
Fixes #122377
Fixes #122378