Redesigining randomized tests for TS functions #129971

pabloem · 2025-06-24T23:57:42Z

Fixes #129567

This is a redesign of a couple test cases in TimeSeriesIT. The change does the following:

Originally, the test did the following:

Generate fully random data
Make Elasticsearch calculate rate aggregate statistics on the input data
Calculate rate aggregate statistics on this random data (i.e. replicate the ES algorithm).
Check that results from 2 and 3 match.

Replicating the same algo in test and db seemed off, so I re-designed the test to do the following:

Generate random rates
Generate data based on these random rates
Make elasticsearch calculate rate aggregate statistics from the input data
Make sure the results from 3 match the parameters from 1

Some assumptions from the test:

In testing, the actual result was about 10% lower than the original generated rate. The assumption is that this comes from the use or disuse of extrapolation algorithms in the ES rate calculation. The test accounts for this by reducing the expected result by 10%.
In testing, the actual results varied by up to 15% from the 10% lower estimate. The test also allows that difference.

Happy to discuss if the margin of error is just too wide.

PTAL @dnhatn

elasticsearchmachine · 2025-07-02T04:30:23Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

dnhatn · 2025-07-02T04:34:18Z

@pabloem The approach looks good to me. I ran a hundred iterations and got some failures. Could you take a look? Thank you!

REPRODUCE WITH: ./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.TimeSeriesRateIT.testRateWithTimeBucketAndClusterMultipleMetricsByMin {seed=[271EF4A9835B6701:65D1E5F9A1347FC9]}" -Dtests.seed=271EF4A9835B6701

Values:
[80.17859426288307, 56.0, 2024-04-15T00:00:00.000Z, prod]
[20.49118102778968, 92.0, 2024-04-15T00:00:00.000Z, qa]
[83.52570348596274, 56.0, 2024-04-15T00:01:00.000Z, prod]
[19.896018843547324, 92.0, 2024-04-15T00:01:00.000Z, qa]
[68.79083801637314, 56.0, 2024-04-15T00:02:00.000Z, prod]

 Hosts:
p0 -> qa, rate=16, cpu=36, numDocs=169
p1 -> prod, rate=19, cpu=19, numDocs=164
p2 -> prod, rate=41, cpu=0, numDocs=162
p3 -> qa, rate=7, cpu=92, numDocs=168
p4 -> prod, rate=36, cpu=56, numDocs=167
Total rate: 119
Average rate: 23.8
Total CPU: 203
Average CPU: 40.6
Count of docs: 830
Docs in each minute:
Minute 320: 150 docs
Minute 321: 140 docs
Minute 322: 140 docs
Minute 323: 135 docs
Minute 324: 121 docs
Minute 325: 144 docs

java.lang.AssertionError: Values:
[80.17859426288307, 56.0, 2024-04-15T00:00:00.000Z, prod]
[20.49118102778968, 92.0, 2024-04-15T00:00:00.000Z, qa]
[83.52570348596274, 56.0, 2024-04-15T00:01:00.000Z, prod]
[19.896018843547324, 92.0, 2024-04-15T00:01:00.000Z, qa]
[68.79083801637314, 56.0, 2024-04-15T00:02:00.000Z, prod]

 Hosts:
p0 -> qa, rate=16, cpu=36, numDocs=169
p1 -> prod, rate=19, cpu=19, numDocs=164
p2 -> prod, rate=41, cpu=0, numDocs=162
p3 -> qa, rate=7, cpu=92, numDocs=168
p4 -> prod, rate=36, cpu=56, numDocs=167
Total rate: 119
Average rate: 23.8
Total CPU: 203
Average CPU: 40.6
Count of docs: 830
Docs in each minute:
Minute 320: 150 docs
Minute 321: 140 docs
Minute 322: 140 docs
Minute 323: 135 docs
Minute 324: 121 docs
Minute 325: 144 docs

Expected: a numeric value within <14.40000057220459> of <86.39999771118164>
     but: <68.79083801637314> differed by <3.209159122603907> more than delta <14.40000057220459>

.../esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/TimeSeriesRateIT.java

kkrik-es · 2025-07-03T12:17:10Z

.../esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/TimeSeriesRateIT.java

+                var requestCount = requestCounts.compute(host, (k, curr) -> {
+                    // 15% chance of reset
+                    if (randomInt(100) <= 15) {
+                        return Math.toIntExact(Math.round(hostToRate.get(k) * tsChange));


Don't we want to add some randomization here too? Otherwise, it's just linear?

the test keeps a randomly-generated, constant linear-rate per host - with random resets. this is how we avoid re-calculating rates

Sounds good, let's document this. I think we can revisit this separately, e.g. store samples in an array per time-series and get the expected rate per interval.

kkrik-es · 2025-07-03T12:20:23Z

.../esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/TimeSeriesRateIT.java

+                        return Math.toIntExact(Math.round((curr == null ? 0 : curr) + hostToRate.get(k) * tsChange));
+                    }
+                });
+                if (hosts.contains(host)) {


Nit: should we skip first, before initializing requestCount?

no, because requestcount follows a linear rate - we always need to calculate for every point in time to be able to keep the same rate. lmk if that makes sense. I can try to rephrase.

.../esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/TimeSeriesRateIT.java

dnhatn

I like the idea. LGTM, thanks @pabloem

dnhatn · 2025-07-07T21:03:55Z

REPRODUCE WITH: ./gradlew ":x-pack:plugin:esql:internalClusterTest" --tests "org.elasticsearch.xpack.esql.action.TimeSeriesRateIT.testRateWithTimeBucketAndClusterMultipleStatsByMin {seed=[C03F3760789CCADD:E7E9047924F5E816]}" -Dtests.seed=C03F3760789CCADD -Dtests.locale=ru-KZ -Dtests.timezone=Africa/Cairo -Druntime.java=24

Values:
[27.19455030545576, 27.19455030545576, 27.19455030545576, 2024-04-15T00:00:00.000Z, prod]
[26.238585183367313, 36.124824330654754, 26.238585183367313, 2024-04-15T00:00:00.000Z, qa]
[21.26328096539162, 21.26328096539162, 21.26328096539162, 2024-04-15T00:01:00.000Z, prod]
[30.34564093259548, 49.98639208926789, 30.34564093259548, 2024-04-15T00:01:00.000Z, qa]
[24.22888264353169, 24.22888264353169, 24.22888264353169, 2024-04-15T00:02:00.000Z, prod]

 Hosts:
p0 -> qa, rate=39, cpu=42, numDocs=77
p1 -> qa, rate=26, cpu=11, numDocs=78
p2 -> prod, rate=29, cpu=82, numDocs=77
p3 -> qa, rate=14, cpu=11, numDocs=76
p4 -> qa, rate=50, cpu=91, numDocs=73
Total rate: 158
Average rate: 31.6
Total CPU: 237
Average CPU: 47.4
Count of docs: 381
Docs in each minute:
Minute 320: 59 docs
Minute 321: 63 docs
Minute 322: 64 docs
Minute 323: 62 docs
Minute 324: 57 docs
Minute 325: 65 docs
Minute 326: 11 docs

java.lang.AssertionError: Values:
[27.19455030545576, 27.19455030545576, 27.19455030545576, 2024-04-15T00:00:00.000Z, prod]
[26.238585183367313, 36.124824330654754, 26.238585183367313, 2024-04-15T00:00:00.000Z, qa]
[21.26328096539162, 21.26328096539162, 21.26328096539162, 2024-04-15T00:01:00.000Z, prod]
[30.34564093259548, 49.98639208926789, 30.34564093259548, 2024-04-15T00:01:00.000Z, qa]
[24.22888264353169, 24.22888264353169, 24.22888264353169, 2024-04-15T00:02:00.000Z, prod]

 Hosts:
p0 -> qa, rate=39, cpu=42, numDocs=77
p1 -> qa, rate=26, cpu=11, numDocs=78
p2 -> prod, rate=29, cpu=82, numDocs=77
p3 -> qa, rate=14, cpu=11, numDocs=76
p4 -> qa, rate=50, cpu=91, numDocs=73
Total rate: 158
Average rate: 31.6
Total CPU: 237
Average CPU: 47.4
Count of docs: 381
Docs in each minute:
Minute 320: 59 docs
Minute 321: 63 docs
Minute 322: 64 docs
Minute 323: 62 docs
Minute 324: 57 docs
Minute 325: 65 docs
Minute 326: 11 docs

Caused by: java.lang.AssertionError: 
Expected: a numeric value within <7.500000298023224> of <43.99999976158142>
     but: <36.124824330654754> differed by <0.3751751329034434> more than delta <7.500000298023224

pabloem · 2025-07-08T19:06:13Z

@dnhatn the failure rate is now around 5% - still pretty high. It comes from outliers in the sampling rate. WDYT?

a couple ideas ...

I could make tests retriable, so if it fails, it runs again (it has to populateIndex again) - this would reduce the failure rate significantly
I could widen the margin of error but I dont really like this...

dnhatn · 2025-07-08T19:56:06Z

Either way works for me.

pabloem added 2 commits June 24, 2025 11:25

Working on rate integration test

699e755

Sketch of new rate tests. Unsure of the source of variation >10pct

8fae5ab

elasticsearchmachine added v9.1.0 v9.2.0 and removed v9.1.0 labels Jun 24, 2025

pabloem and others added 4 commits June 27, 2025 18:29

Tuning and improving auto-test for rate-based aggregations

091490b

Merge branch 'main' into pabloem-rateit

3f2bd20

fixup

697ef59

Merge branch 'main' into pabloem-rateit

fd89c97

pabloem marked this pull request as ready for review July 1, 2025 00:05

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jul 1, 2025

dnhatn self-requested a review July 1, 2025 19:35

dnhatn added :StorageEngine/TSDB You know, for Metrics >test Issues or PRs that are addressing/adding tests and removed needs:triage Requires assignment of a team area label labels Jul 2, 2025

elasticsearchmachine added the Team:StorageEngine label Jul 2, 2025