Migrate several indexing and compaction integration tests to embedded-tests #18207

kfaraz · 2025-07-04T14:41:38Z

Summary

This patch migrates several indexing and compaction integration tests (several of which have been flaking out a lot recently) to the embedded-tests framework.
Overall, this migration will save about 1 hr 30 mins of GitHub runner time.
The MiddleManager version of the integration tests has currently not been ported as they are slower and currently not desirable in embedded tests (because they launch child processes).
The MM-based embedded-tests can be easily enabled at any point in the future.

Changes

The new tests use SQL queries to verify results instead of native since the SQL syntax is much more concise and makes for more readable unit tests. We can add native queries later or in future tests that are migrated to this framework.
Add TaskBuilder utility to create Task objects using fluent syntax
Add JSON data resource files to embedded-tests

Test migrations

Old test	New test
`ITPerfectRollupParallelIndexTest`	`IndexParallelTaskTest`
`ITBestEffortRollupParallelIndexTest`	`IndexParallelTaskTest`, added as a new test parameter which uses dynamic partitioning
`ITAutoCompactionTest`	`AutoCompactionTest`
`ITAutoCompactionLockContentionTest`	`AutoCompactionUpgradeTest`
`ITAutoCompactionLockContentionTest`	`KafkaClusterMetricsTest`, method `test_ingestClusterMetrics_compactionSkipsLockedIntervals()`
`ITCompactionTaskTest`	`CompactionTaskTest`
`ITCompactionSparseColumnTest`	`CompactionSparseColumnTest`
`ITOverlordResourceTest`	Already being verified in `OverlordClientTest`
`ITOverlordResourceNotFoundTest`	Already being verified in `OverlordClientTest`

New nested tests

EmbeddedCentralizedSchemaPublishFailureTest for the group cds-task-schema-publish-disabled
EmbeddedCentralizedSchemaMetadataQueryDisabledTest for the group cds-coordinator-metadata-query-disabled

Test run times

Test	Actual test time	Total time including setup
`IndexParallelTaskTest`(indexer)	38 s	38 s
`AutoCompactionTest` (indexer)	1 min 20 s	1 min 20 s
`CompactionSparseColumnTest` (indexer)	12 s	12 s
Standard `ITPerfectRollupParallelIndexTest`	10 min	15 min
Standard `ITPerfectRollupParallelIndexTest` (Indexer, shuffle deep store test, only 1 config changed)	10 min	15 min
Standard `ITPerfectRollupParallelIndexTest` (MM, shuffle deep store test, only 1 config changed)	10 min	15 min
Standard `ITBestEffortRollupParallelIndexTest`	2 min	NA(setup includes other tests too)
Revised `ITBestEffortRollupParallelIndexTest`	4 min 10 s	"
Standard `ITAutoCompactionTest` (middle manager)	25 mins	"
Standard `ITAutoCompactionTest` (indexer)	15 mins	"
Standard `ITCompactionSparseColumnTest` (indexer)	2 min 10 s	"

This PR has:

…ndexTest

…rfect_rollup_test

…druid into add_embedded_perfect_rollup_test

...ests/src/test/java/org/apache/druid/testing/embedded/compact/EmbeddedAutoCompactionTest.java

...rc/test/java/org/apache/druid/testing/embedded/indexing/EmbeddedKafkaClusterMetricsTest.java

Bug: Concurrent append uses lock of type APPEND which always uses a lock version of epoch 1970-01-01. This can cause data loss in a flow as follows: - Ingest data using an APPEND task to an empty interval - Mark all the segments as unused - Re-run the APPEND task - Data is not visible since old segment IDs (now unused) are allocated again Fix: In segment allocation, do not reuse an old segment ID, used or unused. This fix was already done for some cases back in #16380 . An embedded test for this has been included in #18207

…rfect_rollup_test

clintropolis · 2025-07-10T06:03:21Z

embedded-tests/src/test/resources/data/json/wikipedia_1.json

@@ -0,0 +1,3 @@
+{"timestamp": "2013-08-31T01:02:33Z", "page": "Gypsy Danger", "language" : "en", "tags": ["t1", "t2"], "user" : "nuclear", "unpatrolled" : "true", "newPage" : "true", "robot": "false", "anonymous": "false", "namespace":"article", "continent":"North America", "country":"United States", "region":"Bay Area", "city":"San Francisco", "added": 57, "deleted": 200, "delta": -143}


drive by comment/nit: i know this isn't new, but imo we should fix the problem of referring to this dataset as 'wikipedia' because it is confusing with the quickstart wikipedia data which is also going to be used in some tests, and this stuff only has a vaguely similar schema, maybe tiny-wikipedia or something to indicate that its a very small dataset would help clear things up?

Thanks for the suggestion! I will rename these datasets accordingly.

…rfect_rollup_test

clintropolis

this lgtm, thanks for the builder stuff it feels to me like this should be a lot more ergonomic and less error prone than either the old IT templates or trying to hand craft json specs.

seems fine to merge after fixing up the todo comment one way or another.

One thing I was thinking about while reviewing is that i believe we are perhaps losing some minor coverage of auth stuff during this transition since I think the base ITs had basic-auth setup, though afaict not much in the way of roles and stuff in most tests (so i think was only authentication that would have been tested except for those that extend AbstractAuthConfigurationTest). I think that is probably fine though as long as we migrate the auth tests over to run on this framework, though they probably don't cast quite a wide of net in terms of APIs being called since those tests are more focused on authorization. I don't think I am suggesting we just bake basic auth into random tests or anything, and really its a bit of a negative of the old frameworks that you have to hunt across several files to determine what configuration is actually active for a given test, but maybe something we should watch out for as we move tests over.

embedded-tests/src/test/java/org/apache/druid/testing/embedded/indexing/IndexTaskTest.java

clintropolis · 2025-07-28T20:43:13Z

embedded-tests/src/test/java/org/apache/druid/testing/embedded/compact/CompactionTestBase.java

+import java.util.TreeSet;
+import java.util.stream.Collectors;
+
+public abstract class CompactionTestBase extends EmbeddedClusterTestBase


other than maybe the cluster config, most of the utility methods on this class don't seem too related to compaction, should they live somewhere more common? Is fine to change this later, just want to avoid copy and paste of these methods down the line or like weird extending of compaction test base for things that aren't doing any compaction

Fair enough, I will move the common utility methods to some utility class similar to EmbeddedMsqApis as suggested by @gianm here

clintropolis · 2025-07-28T21:01:16Z

embedded-tests/src/test/java/org/apache/druid/testing/embedded/compact/AutoCompactionTest.java

      );

      // Wait for scheduler to pick up the compaction job
+      // TODO: make this latch-based


should either do this or transition from a TODO to a comment about how someone in the future can improve this

Yes, please don't commit todo comments.

Sorry, must have missed this in the cleanup.

gianm · 2025-07-15T01:13:24Z

embedded-tests/src/test/java/org/apache/druid/testing/embedded/compact/AutoCompactionTest.java

+          .ofTypeIndexParallel()
+          .jsonInputFormat()
+          .inlineInputSourceWithData(Resources.InlineData.JSON_2_ROWS)
+          .isoTimestampColumn("timestamp")


I wonder if it would be nicer to have .timestampColumn("timestamp", "iso"). Makes it easier to sub in "auto" or whatever other format.

I did have that initially. But realized that most tests are just using an ISO timestamp. So used this syntax sugar instead.

For a fully custom timestamp, there is also the option to use.

.dataSchema(d -> d.withTimestamp(new TimestampSpec(...)))

Please let me know which one you prefer.

I'm ok with either one.

gianm · 2025-07-15T01:17:09Z

embedded-tests/src/test/java/org/apache/druid/testing/embedded/compact/AutoCompactionTest.java

+   * Verifies the result of a SELECT query
+   *
+   * @param field  Field to select
+   * @param result CSV result with special strings {@code ||} to represent


why do this rather than accept the CSV as-is?

Yeah, I didn't want to do this initially, but it made for more readable tests as it avoided all the escaping of empty strings and newlines.

verifyScanResult("added", "...||31||...||62");

vs

verifyScanResult("added", "\"\"\n31\"\"\n62");

Please let me know if this seems hacky and if you feel that it is cleaner to just use the original.

I see. To me the first form is hard to get used to. I keep wanting to read the || as field separators rather than newlines. The second is also weird looking though. I can't think of a good solution immediately. I'm ok with what you think is best.

I tried a bunch of different symbols, but nothing really conveyed a "newline" well-enough, not even ↵ or ⏎.
The empty strings are probably the worse of the two though.
Using ellipsis for now, but keeping newline \n as is.

So, we would have something like this:

verifyScanResult("added", "...\n31\n...\n62");

gianm · 2025-07-28T22:03:40Z

embedded-tests/src/test/java/org/apache/druid/testing/embedded/compact/AutoCompactionTest.java

      );

      // Wait for scheduler to pick up the compaction job
+      // TODO: make this latch-based


Yes, please don't commit todo comments.

gianm · 2025-07-28T22:25:45Z

embedded-tests/src/test/java/org/apache/druid/testing/embedded/compact/CompactionTestBase.java

+   *
+   * @return ID of the task.
+   */
+  protected String runTask(TaskBuilder<?, ?, ?> taskBuilder, String dataSource)


IMO, protected methods on a base class aren't the best way to do utility APIs. Other test cases might want some of these utility methods, but not to be "compaction tests". Also, some tests might want to extend multiple base classes, and this approach makes it impossible.

With MSQ tests we have EmbeddedMSQApis, something that collects together utility APIs without using a base class approach. Something similar might work here?

Makes sense, thanks for the suggestion!

Moved most methods to EmbeddedClusterApis itself so that all tests can benefit from it.
I have still kept some of the protected methods in CompactionTestBase but these mostly just act as syntax sugar over the methods in EmbeddedClusterApis to keep the diff in the compaction test classes small.

gianm · 2025-07-28T22:28:00Z

embedded-tests/src/test/java/org/apache/druid/testing/embedded/indexing/Resources.java

+/**
+ * Constants and utility methods used in embedded cluster tests.
+ */
+public class Resources


Moving this to embedded-tests means that it's now tougher for extensions to write their own tests that pull in Resources. Can you split this up? Such as having some Resources in services, and MoreResources in embedded-tests.

…rfect_rollup_test

kfaraz · 2025-07-29T05:53:38Z

One thing I was thinking about while reviewing is that i believe we are perhaps losing some minor coverage of auth stuff during this transition since I think the base ITs had basic-auth setup, though afaict not much in the way of roles and stuff in most tests (so i think was only authentication that would have been tested except for those that extend AbstractAuthConfigurationTest). I think that is probably fine though as long as we migrate the auth tests over to run on this framework, though they probably don't cast quite a wide of net in terms of APIs being called since those tests are more focused on authorization. I don't think I am suggesting we just bake basic auth into random tests or anything, and really its a bit of a negative of the old frameworks that you have to hunt across several files to determine what configuration is actually active for a given test, but maybe something we should watch out for as we move tests over.

Thanks for calling this out, @clintropolis !

I do plan to migrate the auth tests as well to the embedded framework.
For now, we will migrate the non-auth tests without basic auth enabled.
Once all tests are migrated, we can add test variants which have auth enabled to increase API coverage.

kfaraz · 2025-07-29T06:39:31Z

@gianm , @clintropolis , thanks for the feedback!
I have updated the PR based on your suggestions.
Please let me know if further changes are required.

gianm · 2025-07-29T06:51:38Z

@gianm , @clintropolis , thanks for the feedback! I have updated the PR based on your suggestions. Please let me know if further changes are required.

Approved, since the only remaining comment (about verifyScanResult) is nonblocking.

kfaraz added 2 commits July 4, 2025 20:08

Add EmbeddedIndexParallelTaskTest to migrate ITPerfectRollupParallelI…

400de14

…ndexTest

Use deepstore for intermediary storage

38c87ef

kfaraz changed the title ~~Add EmbeddedIndexParallelTaskTest to migrate ITPerfectRollupParallelI…~~ Migrate ITPerfectRollupParallelIndexTest and ITBestEffortRollupParallelIndexTest to embedded-tests Jul 4, 2025

Remove old tests

52b2e71

github-actions bot added the GHA label Jul 4, 2025

Remove 2 unused ITs

c43205c

kfaraz requested a review from gianm July 4, 2025 15:53

Fix deps

91f092e

github-actions bot added the Area - Dependencies label Jul 4, 2025

kfaraz added 7 commits July 4, 2025 23:08

Minor cleanup

5c3a2de

Fix dependencies

b0e1026

Merge branch 'master' of github.com:apache/druid into add_embedded_pe…

081e1e2

…rfect_rollup_test

Convert ITAutoCompactionTest to EmbeddedAutoCompactionTest

6279630

Merge branch 'master' of github.com:apache/druid into add_embedded_pe…

6eeacc8

…rfect_rollup_test

Merge branch 'add_embedded_perfect_rollup_test' of github.com:kfaraz/…

792d54f

…druid into add_embedded_perfect_rollup_test

Fix pom

0b98f64

kfaraz changed the title ~~Migrate ITPerfectRollupParallelIndexTest and ITBestEffortRollupParallelIndexTest to embedded-tests~~ Migrate several indexing and compaction integration tests to embedded-tests Jul 8, 2025

Add necessary extensions to embedded cluster test

60200d8

github-advanced-security bot found potential problems Jul 8, 2025

View reviewed changes

...ests/src/test/java/org/apache/druid/testing/embedded/compact/EmbeddedAutoCompactionTest.java Fixed Show fixed Hide fixed

...rc/test/java/org/apache/druid/testing/embedded/indexing/EmbeddedKafkaClusterMetricsTest.java Fixed Show fixed Hide fixed

kfaraz added 4 commits July 8, 2025 13:19

Do not use deprecated APIs

f2f2d3a

Remove upgrade TestNGGroup

1c1959f

Address failures

7ee8b38

Add embedded test for concurrent append and replace

7004592

kfaraz mentioned this pull request Jul 8, 2025

Fix concurrent append to interval with only unused segments #18216

Merged

10 tasks

Fix deps and tests

bf465e5

kfaraz added 3 commits July 8, 2025 22:45

Merge branch 'master' of github.com:apache/druid into add_embedded_pe…

0a9572b

…rfect_rollup_test

Enable test that validates fix for concurrent append

0caf10e

Fix test name

f4d0017

clintropolis reviewed Jul 10, 2025

View reviewed changes

Akshat-Jain closed this Jul 15, 2025

Akshat-Jain reopened this Jul 15, 2025

This was referenced Jul 16, 2025

Add DruidContainer to completely phase out revised and standard ITs #18265

Closed

Add DruidContainers to run docker tests with embedded-test framework #18302

Merged

Merge branch 'master' of github.com:apache/druid into add_embedded_pe…

b33e413

…rfect_rollup_test

Akshat-Jain closed this Jul 23, 2025

Akshat-Jain reopened this Jul 23, 2025

Fix compile

aaae9e2

Akshat-Jain closed this Jul 23, 2025

Akshat-Jain reopened this Jul 23, 2025

Merge branch 'master' of github.com:apache/druid into add_embedded_pe…

f727599

…rfect_rollup_test

Akshat-Jain closed this Jul 25, 2025

Akshat-Jain reopened this Jul 25, 2025

clintropolis approved these changes Jul 28, 2025

View reviewed changes

gianm reviewed Jul 28, 2025

View reviewed changes

kfaraz added 2 commits July 29, 2025 11:20

Add MoreResources, move utility methods to EmbeddedClusterApis

44a6547

Merge branch 'master' of github.com:apache/druid into add_embedded_pe…

68239d1

…rfect_rollup_test

kfaraz added 3 commits July 29, 2025 11:47

Move more methods

a95cedd

Clean up

5c22b78

More clean up

ee9886a

gianm approved these changes Jul 29, 2025

View reviewed changes

Remove usages of ||

8832555

Akshat-Jain closed this Jul 29, 2025

Akshat-Jain reopened this Jul 29, 2025

kfaraz merged commit 3b7dd53 into apache:master Jul 29, 2025
72 checks passed

kfaraz deleted the add_embedded_perfect_rollup_test branch July 29, 2025 10:52

cecemei added this to the 35.0.0 milestone Oct 21, 2025

		@@ -0,0 +1,3 @@
		{"timestamp": "2013-08-31T01:02:33Z", "page": "Gypsy Danger", "language" : "en", "tags": ["t1", "t2"], "user" : "nuclear", "unpatrolled" : "true", "newPage" : "true", "robot": "false", "anonymous": "false", "namespace":"article", "continent":"North America", "country":"United States", "region":"Bay Area", "city":"San Francisco", "added": 57, "deleted": 200, "delta": -143}

Migrate several indexing and compaction integration tests to embedded-tests #18207

Migrate several indexing and compaction integration tests to embedded-tests #18207

Uh oh!

Conversation

kfaraz commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test migrations

New nested tests

Test run times

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz commented Jul 29, 2025

Uh oh!

kfaraz commented Jul 29, 2025

Uh oh!

gianm commented Jul 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kfaraz commented Jul 4, 2025 •

edited

Loading

kfaraz Jul 29, 2025 •

edited

Loading