A random-random test for time-series data #132556

pabloem · 2025-08-07T23:58:36Z

Follow up items after this PR:

Test rate function and counters in general
Randomize window size
Increase test size
Hit some more corner cases (e.g. zero out some parameters)
Add a 'fixed schema' test case (as opposed to the current dynamic one)

elasticsearchmachine · 2025-08-11T18:08:31Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

not-napoleon

I'd like to get some confirmation from @kkrik-es that this is doing what he wants, but I think it's pretty good. I left some feedback, none of which is critical but I'd like to get it addressed.

test/framework/src/main/java/org/elasticsearch/datageneration/FieldType.java

test/framework/src/main/java/org/elasticsearch/datageneration/MappingGenerator.java

...c/main/java/org/elasticsearch/datageneration/datasource/DefaultMappingParametersHandler.java

not-napoleon · 2025-08-11T18:43:28Z

...in/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/GenerativeTSIT.java

+    private List<XContentBuilder> documents = null;
+    private DataGenerationHelper dataGenerationHelper;
+
+    private static final class DataGenerationHelper {


I wonder if this should be a top level class. Seems like we'll want to build multiple test classes using this framework.

I've moved this class to its own file! TY.

not-napoleon · 2025-08-11T18:50:17Z

...in/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/GenerativeTSIT.java

+        private static Object randomDimensionValue(String dimensionName) {
+            // We use dimensionName to determine the type of the value.
+            var isNumeric = dimensionName.hashCode() % 5 == 0;
+            if (isNumeric) {


What about IP dimensions?

added 20% of dimensions as IP-like.

as follow up ill add dynamic mapping to parse as ip. thoughts?

Why not rely on the existing one:
https://github.com/elastic/elasticsearch/blob/main/test/framework/src/main/java/org/elasticsearch/datageneration/fields/leaf/IpFieldDataGenerator.java

...in/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/GenerativeTSIT.java

pabloem

TY @not-napoleon - ptal!

test/framework/src/main/java/org/elasticsearch/datageneration/MappingGenerator.java

...c/main/java/org/elasticsearch/datageneration/datasource/DefaultMappingParametersHandler.java

...in/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/GenerativeTSIT.java

test/framework/src/main/java/org/elasticsearch/datageneration/FieldType.java

pabloem · 2025-08-12T00:04:23Z

...in/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/GenerativeTSIT.java

+    private List<XContentBuilder> documents = null;
+    private DataGenerationHelper dataGenerationHelper;
+
+    private static final class DataGenerationHelper {


I've moved this class to its own file! TY.

pabloem · 2025-08-12T00:06:49Z

...in/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/GenerativeTSIT.java

+        private static Object randomDimensionValue(String dimensionName) {
+            // We use dimensionName to determine the type of the value.
+            var isNumeric = dimensionName.hashCode() % 5 == 0;
+            if (isNumeric) {


added 20% of dimensions as IP-like.

pabloem · 2025-08-12T00:17:43Z

...in/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/GenerativeTSIT.java

+        private static Object randomDimensionValue(String dimensionName) {
+            // We use dimensionName to determine the type of the value.
+            var isNumeric = dimensionName.hashCode() % 5 == 0;
+            if (isNumeric) {


as follow up ill add dynamic mapping to parse as ip. thoughts?

...rc/main/java/org/elasticsearch/datageneration/datasource/DefaultObjectGenerationHandler.java

...in/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/GenerativeTSIT.java

...src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/TSDataGenerationHelper.java

kkrik-es

Thanks Pablo, this ia a good step. It's nice that you tried to include the pass-through field on the first take, though that complicates things somewhat. I'd start with statically defined dimension and metric fields to get the validation logic in place first, then introduce dynamic fields on top of that.

Let's try to refactor the logic slightly so that it can be further extended in follow-up PRs.

...src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/RandomizedTimeSeriesIT.java

kkrik-es

Looks good, thanks for addressing the comments. Let's keep iterating.

romseygeek · 2025-08-21T08:16:52Z

The changes here to FieldType#tryParse() are causing test failures:

./gradlew ":x-pack:plugin:logsdb:javaRestTest" --tests "org.elasticsearch.xpack.logsdb.qa.LogsDbVersusReindexedLogsDbChallengeRestIT.testRandomQueries" -Dtests.seed=F4B510DA34A404A2 -Dtests.locale=ko-KP -Dtests.timezone=EST5EDT -Druntime.java=24

LogsDbVersusReindexedLogsDbChallengeRestIT > testRandomQueries FAILED
    java.lang.IllegalArgumentException: Unknown field type: geo_shape
        at __randomizedtesting.SeedInfo.seed([F4B510DA34A404A2:AA9EA0360C7E293C]:0)
        at org.elasticsearch.datageneration.FieldType.tryParse(FieldType.java:111)
        at org.elasticsearch.datageneration.queries.LeafQueryGenerator.buildForType(LeafQueryGenerator.java:31)

Specifically, the LeafQueryGenerator expects to get null back from unrecognized field types, instead of an Exception.

First test case in prototype messy test file

613bda8

elasticsearchmachine added the v9.2.0 label Aug 7, 2025

elasticsearchmachine and others added 6 commits August 8, 2025 00:04

[CI] Auto commit changes from spotless

2d70cef

First two randomized test cases

19e2f8a

smol cleanup

14b4c17

Merge branch 'main' into pem-randomrandom-testing

836d49a

[CI] Auto commit changes from spotless

64624be

cleanup and ready for first check

77a368b

pabloem marked this pull request as ready for review August 11, 2025 18:06

Merge branch 'main' into pem-randomrandom-testing

f53f904

pabloem changed the title ~~[wip][do not review] A random-random test for time-series data~~ A random-random test for time-series data Aug 11, 2025

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Aug 11, 2025

pabloem added >test Issues or PRs that are addressing/adding tests :StorageEngine/TSDB You know, for Metrics :StorageEngine/ES|QL Timeseries / metrics / logsdb capabilities in ES|QL and removed needs:triage Requires assignment of a team area label labels Aug 11, 2025

elasticsearchmachine added the Team:StorageEngine label Aug 11, 2025

not-napoleon reviewed Aug 11, 2025

View reviewed changes

Address comments

afdac85

pabloem commented Aug 12, 2025

View reviewed changes