Skip to content

Conversation

@gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Sep 5, 2025

In this PR we introduce telemetry for time series trying to answer the following questions:

  • How many time series data streams (tsds) does a cluster have?
  • How many time series indices do they have?
  • How many tsds are downsampled by ILM?
  • How many downsampling rounds are being used with ILM?
  • How many tsds are downsampled by DLM?
  • How many downsampling rounds are being used with DLM?
  • How are the numbers differ between serverless and stateful?
  • Which ILM phase is most commonly used for downsampling?

Fixes: #133953

@gmarouli gmarouli added >enhancement :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data labels Sep 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

@gmarouli gmarouli marked this pull request as ready for review September 8, 2025 06:09
@gmarouli gmarouli requested a review from kkrik-es September 8, 2025 06:09
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

/*
* We now add a number of simulated data streams to the cluster state. We mix different combinations of:
* - time series and standard data streams & backing indices
* - lifecycle with or without downsampling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* - lifecycle with or without downsampling
* - DLM with or without downsampling

var downsamplingConfiguredBy = randomFrom(DownsampledBy.values());
boolean isDownsampled = downsamplingConfiguredBy != DownsampledBy.NONE && isTimeSeriesDataStream;
// An index/data stream can have both ILM & DLM configured; by default, ILM "wins"
boolean hasLifecycle = usually() || (isDownsampled && downsamplingConfiguredBy == DownsampledBy.DLM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: usually is !rarely so it's almost always.. Maybe use randomDouble() < 0.8 or so, to make it more concrete?

tsIndexCount,
ilmStats.getDownsamplingStats(),
ilmStats.getIlmPolicyStats(),
dlmStats.getDownsamplingStats(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dlmStats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean why I picked this variable name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this call is for ilmStats, surprised to see dlmStats for this arg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. There are two different constructors one that accepts stats for both ILM and DLM and one that only accepts DLM for the serverless use case, this was syntactic sugar to make more explicit that ILM stats would be null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment for clarity

builder.field("phases_in_use", phasesUsedInDownsampling);
builder.endObject();
}
builder.startObject("dlm");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check for null dlmDownsamplingStats here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't hurt, I will add it.

}
}

public record DownsamplingFeatureStats(long dataStreamsCount, long indexCount, long minRounds, double averageRounds, long maxRounds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to add stats on dataset size and reduction, later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stats on dataset size and reduction require a lot more infrastructure that we do not have currently. If we do move ahead with these plans then yes, I think we should try to expose them here too.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clean. Consider asking Martijn to take a look too.

@gmarouli gmarouli requested a review from martijnvg September 8, 2025 11:33
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@gmarouli gmarouli added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 8, 2025
@elasticsearchmachine elasticsearchmachine merged commit 5374c33 into elastic:main Sep 8, 2025
34 checks passed
@gmarouli gmarouli deleted the downsampling++/add-basic-telemetry branch September 8, 2025 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Downsampling] Enhance telemetry.

4 participants