Skip to content

Conversation

@ldematte
Copy link
Contributor

@ldematte ldematte commented Feb 7, 2025

HDFS IT tests started failing with Java 24.

The problem is that some hadoop libraries (mainly hdfs-common) contain a call to a SecurityManager related method that has been removed in JDK 24.
A short term solution is to patch hdfs-fixture as we do for hadoop-client-api (same as we did in #119779)

Fixes #121967
Fixes #122377
Fixes #122378

@ldematte ldematte added >test Issues or PRs that are addressing/adding tests :Delivery/Build Build or test infrastructure :Core/Infra/Core Core issues without another label auto-backport Automatically create backport pull requests when merged v9.0.1 v9.1.0 labels Feb 7, 2025
@ldematte ldematte requested a review from breskeby February 7, 2025 15:16
@@ -41,18 +41,17 @@ public static void main(String[] args) throws Exception {
try (JarFile jarFile = new JarFile(new File(jarPath))) {
for (var patcher : patchers.entrySet()) {
JarEntry jarEntry = jarFile.getJarEntry(patcher.getKey());
if (jarEntry == null) {
throw new IllegalArgumentException("path [" + patcher.getKey() + "] not found in [" + jarPath + "]");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had to go, as some classes are not present in both jars; it's unfortunate, but I think acceptable. If it's not, should we pass the classes to patch via the command line, so it is configurable in each gradle JavaExec task?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pass the files to patch down, it should be part of the configuration of the task. we can keep how to actually patch the classes here, but the list of classes to patch be passed in.

However, why would the same class be present in two different jars? That's jarhell, which we should be rejecting already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have said both projects; the hdsf-fixture projects needs a subset of the patches, as it pulls in a different (smaller) set of dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still missing something. Why do dependencies matter? We should only be patching a given class once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classes we need to patch are in the 2 project's dependencies: for hadoop-client-api, they are in both hadoop-common and hadoop-auth; hdfs-fixture however depends only on hadoop-common (via hadoop-minicluster), so trying to patch classes in hadoop-auth result in an error (the class is not there).
But like you said, this can be solved by making the patcher get a list of the class names.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @mark-vieira comment on the gradle, but I suspect we may want this to be a gradle plugin so we don't eg have to duplicate the patchClasses task and setup.

@@ -41,18 +41,17 @@ public static void main(String[] args) throws Exception {
try (JarFile jarFile = new JarFile(new File(jarPath))) {
for (var patcher : patchers.entrySet()) {
JarEntry jarEntry = jarFile.getJarEntry(patcher.getKey());
if (jarEntry == null) {
throw new IllegalArgumentException("path [" + patcher.getKey() + "] not found in [" + jarPath + "]");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pass the files to patch down, it should be part of the configuration of the task. we can keep how to actually patch the classes here, but the list of classes to patch be passed in.

However, why would the same class be present in two different jars? That's jarhell, which we should be rejecting already?

@ldematte
Copy link
Contributor Author

ldematte commented Feb 7, 2025

'll let @mark-vieira comment on the gradle, but I suspect we may want this to be a gradle plugin so we don't eg have to duplicate the patchClasses task and setup.

That would be ideal I think. Maybe we can work on this together, so I learn a bit about gradle plugins!

@breskeby
Copy link
Contributor

breskeby commented Feb 11, 2025

@ldematte This is more or less the perfect use case to use Gradle Artefact Transforms. The goal of those artefact transforms is to tweak third party dependencies before using them in your build.

The benefits in general are IMO

  • the transformations is done transparent to the usage as part of the dependency resolution
  • we have built in caching support
  • less clutter and custom build logic in our gradle files.
  • is also that caching is build in supported and the project using these transforms is cleaner.
  • Also we probably can probably broaden its usage and reusability in the future as we 'patch' other dependencies as well (e.g. jwt)

I pushed a draft on how to port this PR to use artefact transforms to https://github.com/breskeby/elasticsearch/tree/fix-hdfs-tests-java24-artifact-transforms

Happy to talk about this synchronously and help you bring that over the line. Parts of what's living in the build script in my example likely can be moved to a plugin.

Copy link
Contributor

@breskeby breskeby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly prefer using gradle artefact transforms for these kind of dependency patching. Details at #122044 (comment)

@ldematte
Copy link
Contributor Author

Thanks Rene, I'll update it as we discussed and make it parametric, so we can have back the check that the expected classes are all patched.

@rjernst
Copy link
Member

rjernst commented Feb 11, 2025

@breskeby What does the resulting artifact naming look like? One nice thing currently about the way we do this kind of patching is that we can make clear it's elastic-modified (not that we do change the name in this case currently, but in other cases eg with log4j we have tweaked the name).

@breskeby
Copy link
Contributor

We can choose the naming in our implementation. In the example branch I just added "patched" to the input jar. So it would be "hadoop-common-2.8.7.jar" -> "hadoop-common-2.8.7-patched.jar"

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-delivery (Team:Delivery)

@ldematte
Copy link
Contributor Author

I moved the patcher implementation to a transformer as @breskeby suggested, and added the ability to specify (via a parameter) which set of classes to patch for which jar/artifact.

@ldematte ldematte requested a review from breskeby February 19, 2025 10:26
Copy link
Contributor

@breskeby breskeby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and getting rid of hadoop-client-api subproject is a very nice sideeffect. I added a few nitpicks which would be great to be addressed before merging.

Otherwise LGTM.

// Add the entry to the new JAR file
jos.putNextEntry(new JarEntry(entryName));

Function<ClassWriter, ClassVisitor> classPatcher = jarPatchers.get(entryName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add a check here to verify all expected patches are applied and fail early if not to avoid having undetected changes sneaked into this again by e.g. updating hadoop versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that and it's a great idea; however there is one big blocker for this: the hadoop-common artifact has one additional "classifier" named tests, which contains only tests of course but is brought in anyway.
I tried to exclude it but apparently it's not possible to do that in gradle (there is limited support for classifiers).
The thing is that hadoop-common is applied to both hadoop-common jars (regular and "tests"), and it will fail on the second, because none of the expected patches will be applied there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: if you have a good way to either exclude hadoop-common:tests or to recognize it and avoid trying to patch it I'm all ears!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could tweak the regex to match the file name being stricter in a sense of expected name + version regex instead of just a simple string.contains based check to fix to not match -tests artifacts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that; I discarded it because "2.8.5-tests" is a valid "version" (isn't it? like 1.0.0-beta).
I can allow only the numeric part; it's not super clean but..
I would have preferred excluding the jar completely, since it's only tests (no purpose having it), but seems that this is very hard to do in gradle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try and see the regex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to deal with that test classifier stuff with hadoop before and suffered myself. okay. lets just live with that for now. we can revisit later if we feel the need. I will use parts of that stuff later on to fix other places in our codebase.

static final List<JarPatchers> allPatchers = List.of(
new JarPatchers(
"hadoop-common",
Pattern.compile("hadoop-common-(?!.*tests)"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@breskeby I added regex matching to bring back the "have we applied all patches" check; let me know if you like it or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ldematte ldematte enabled auto-merge (squash) February 19, 2025 16:17
@ldematte ldematte merged commit 340a2ce into elastic:main Feb 19, 2025
17 checks passed
@ldematte ldematte deleted the fix-hdfs-tests-java24 branch February 19, 2025 17:27
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 122044

ldematte added a commit that referenced this pull request Feb 20, 2025
ldematte added a commit that referenced this pull request Feb 20, 2025
ldematte added a commit to ldematte/elasticsearch that referenced this pull request Feb 20, 2025
ldematte added a commit to ldematte/elasticsearch that referenced this pull request Feb 20, 2025
ldematte added a commit to ldematte/elasticsearch that referenced this pull request Feb 20, 2025
elasticsearchmachine pushed a commit that referenced this pull request Feb 20, 2025
#122044 solved
(temporarily) the issue with HDFS libraries not being compatible with
JDK 24. This PR unmutes a test with the same issues

Solves #122024
elasticsearchmachine pushed a commit that referenced this pull request Feb 20, 2025
* Fix hdfs-related IT tests for java24 (#122044)

* Have ASM recompute frames on patched classes
elasticsearchmachine pushed a commit that referenced this pull request Feb 20, 2025
* Fix hdfs-related IT tests for java24 (#122044)

* Have ASM recompute frames on patched classes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Core/Infra/Core Core issues without another label :Delivery/Build Build or test infrastructure Team:Core/Infra Meta label for core/infra team Team:Delivery Meta label for Delivery team >test Issues or PRs that are addressing/adding tests v9.0.1 v9.1.0

Projects

None yet

4 participants