Skip to content

indexing-service no longer depends on indexing-hadoop#18247

Merged
gianm merged 10 commits intoapache:masterfrom
clintropolis:indexing-hadoop-depends-on-indexing-service
Jul 15, 2025
Merged

indexing-service no longer depends on indexing-hadoop#18247
gianm merged 10 commits intoapache:masterfrom
clintropolis:indexing-hadoop-depends-on-indexing-service

Conversation

@clintropolis
Copy link
Member

Description

This PR makes it so that the indexing-hadoop module is no longer a dependency of indexing-service. Instead, indexing-hadoop depends on indexing-service, and the only references to indexing-hadoop is the services module.

To validate the changes I ran ITHadoopIndexTest, which .. failed because I think this thing hasn't been updated in some time (i tried but gave up), but all of the hadoop ingestion tasks it ran as part of the tests ran and succeeded, just the queries after failed. I saw the same failures without the changes in this PR too (with the dockerfile and json file fixes from this PR), so I think everything is still ok.

The next step to explore after this PR is moving the indexing-hadoop module into a contrib extension.

private final IndexSpec indexSpec = IndexSpec.DEFAULT;

@Rule
public ExpectedException thrown = ExpectedException.none();

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
ExpectedException.none
should be avoided because it has been deprecated.
Comment on lines +65 to +74
DataSchema.builder()
.withDataSource("foo")
.withGranularity(
new UniformGranularitySpec(
Granularities.DAY,
null,
ImmutableList.of(Intervals.of("2010-01-01/P1D"))
)
)
.withObjectMapper(jsonMapper)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
Builder.withObjectMapper
should be avoided because it has been deprecated.
Comment on lines +112 to +120
DataSchema.builder()
.withDataSource("foo")
.withGranularity(
new UniformGranularitySpec(
Granularities.DAY,
null, ImmutableList.of(Intervals.of("2010-01-01/P1D"))
)
)
.withObjectMapper(jsonMapper)

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
Builder.withObjectMapper
should be avoided because it has been deprecated.
rm -rf /var/cache/yum zulu-repo_${ZULU_REPO_VER}.noarch.rpm

ENV JAVA_HOME=/usr/lib/jvm/zulu17
ENV JAVA_HOME=/usr/lib/jvm/zulu11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the downgrade?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the hadoop dockerfile, it doesn't support java 17

@@ -162,7 +157,6 @@ private TaskConfig(
String baseDir,
File baseTaskDir,
String hadoopWorkingPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should defaultHadoopCoordinates and hadoopWorkingPath be removed too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, yes

final SegmentMetadataPublisher maybeHandler;
if (config.isUpdaterJobSpecSet()) {
maybeHandler = new SegmentMetadataPublisher(INJECTOR.getInstance(IndexerMetadataStorageCoordinator.class));
maybeHandler = new SegmentMetadataPublisher(HadoopTask.INJECTOR.getInstance(IndexerMetadataStorageCoordinator.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw that HadoopTask has its own static INJECTOR and EXTENSION_LOADER. 😅
Didn't go through the whole thing but seems like a really hacky approach to load a bunch of classes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, this isn't new to this PR 😅. i suppose this line change isn't really needed and was just a side-effect of moving stuff around, but will leave it unless i need to push more commits to this branch

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions:

  • Does the injector and classpath dark magic still work properly with this change? I recall there is some gnarly stuff where we're inspecting the classloader, pulling the jars out of it, and using that to populate the Hadoop classpath. (If the Hadoop ITs pass, probably the answer is yes this still works)
  • With this change, if someone removes the druid-hadoop-indexing jar, does Druid still work properly? If so that is great.

@clintropolis
Copy link
Member Author

Does the injector and classpath dark magic still work properly with this change? I recall there is some gnarly stuff where we're inspecting the classloader, pulling the jars out of it, and using that to populate the Hadoop classpath. (If the Hadoop ITs pass, probably the answer is yes this still works)

Yes, as far as I can tell everything is still working correctly. The Hadoop ITs don't actually pass for me because the query expectations it is doing afterwards are failing, but the ingestion tasks succeed (and i see the same query expectation failures before this change too, so I think it ok).

With this change, if someone removes the druid-hadoop-indexing jar, does Druid still work properly? If so that is great.

I haven't actually tried this yet, but I think it should be possible after #18246, since the only remaining references are the HadoopIndexTaskModule references in CliOverlord, CliMiddleManager, CliPeon, and CliIndexer it should be pretty easy to unhook and leave out of the build. I'm actually going to see if I can just go ahead and get it to work as an actual extension after this gets merged, so might be able to do even better so that it can be left out by default.

@gianm gianm merged commit 5b6a2a2 into apache:master Jul 15, 2025
141 of 143 checks passed
@clintropolis clintropolis deleted the indexing-hadoop-depends-on-indexing-service branch July 16, 2025 00:07
ashibhardwaj pushed a commit to ashibhardwaj/druid that referenced this pull request Jul 23, 2025
* indexing-service no longer depends on indexing-hadoop, instead indexing-hadoop depends on indexing-service

* fixes

* cleanup

* fix it

* fix declared dependencies

* suppress warning

* try again

* suppress check
@cecemei cecemei added this to the 35.0.0 milestone Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants