Skip to content

Conversation

@valeriy42
Copy link
Contributor

@valeriy42 valeriy42 commented Sep 26, 2025

Previously, the upper bound for model memory checks was set in absolute terms, which is not easy to understand and is brittle. I adjusted the assertions to ensure that the memory usage does not exceed 5% of the memory limit. Additionally, on Linux, we now report the process size (see #131981), which includes approximately 20 MB of native code overhead. I made handling this overhead more explicit.

More details:

  • Removed muted tests for testManyDistinctOverFields and testTooManyByAndOverFields.
  • Introduced constants for memory limits in AutodetectMemoryLimitIT.
  • Updated assertions to check effective model size against calculated limits.

Closes #132308
Closes #132310
Closes #132611

…prove model size assertions

- Removed muted tests for `testManyDistinctOverFields` and `testTooManyByAndOverFields`.
- Introduced constants for memory limits in `AutodetectMemoryLimitIT`.
- Updated assertions to check effective model size against calculated limits.
@valeriy42 valeriy42 added >test Issues or PRs that are addressing/adding tests :ml Machine learning Team:ML Meta label for the ML team labels Sep 26, 2025
@valeriy42 valeriy42 marked this pull request as ready for review September 26, 2025 10:26
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@jan-elastic jan-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally look good. Some small comments.

- Changed variable names from `memoryLimit` to `memoryLimitMb` for clarity.
- Updated memory limit assertions to reflect the new variable naming.
- Ensured consistency in memory limit usage across multiple test cases.
*/
public class AutodetectMemoryLimitIT extends MlNativeAutodetectIntegTestCase {

private static final long PROCESS_OVERHEAD_BYTES = ByteSizeValue.ofMb(20).getBytes();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this value coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is estimated empirically. I used a small, simple anomaly detection job with a trivial model to establish the lower bound on the process memory usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to add a comment explaining that? If there's a risk of that value changing at some point in the future, it would be good to know what needs to be do to recalculate and update it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second though, checking if the autodetect process has a specific memory overhead make these tests unnecessary brittle. I removed it and only kept checking for hard_limit. In ml-cpp we have unit tests to ensure there are no memory leaks.

assertThat(modelSizeStats.getModelBytes(), lessThan(120500000L));
assertThat(modelSizeStats.getModelBytes(), greaterThan(70000000L));
assertThat(getEffectiveModelSize(modelSizeStats.getModelBytes()), lessThan(ByteSizeValue.ofMb(memoryLimitMb).getBytes() * 1.05));
assertThat(modelSizeStats.getMemoryStatus(), equalTo(ModelSizeStats.MemoryStatus.HARD_LIMIT));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reporting of memory usage is different on different plattforms: on Linux and Windows we report actual process memory usage, while on MacOS this information is not available and hence we report estimated memory usage. This makes the test for a specific memory limit brittle. To improve robustness, I only check that the job is in the hard_limit state.

assertThat(modelSizeStats.getModelBytes(), lessThan(72000000L));
assertThat(modelSizeStats.getModelBytes(), greaterThan(24000000L));
assertThat(getEffectiveModelSize(modelSizeStats.getModelBytes()), lessThan(ByteSizeValue.ofMb(memoryLimitMb).getBytes() * 1.05));
assertThat(modelSizeStats.getMemoryStatus(), equalTo(ModelSizeStats.MemoryStatus.HARD_LIMIT));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only check if the job is in hard_limit state to increase test robustness on different plattforms

assertThat(modelSizeStats.getModelBytes(), greaterThan(24000000L));

assertThat(getEffectiveModelSize(modelSizeStats.getModelBytes()), lessThan(ByteSizeValue.ofMb(memoryLimitMb).getBytes() * 1.05));
assertThat(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only check for hard_limit to increase test robustness.

assertThat(modelSizeStats.getModelBytes(), lessThan(45000000L));
assertThat(modelSizeStats.getModelBytes(), greaterThan(25000000L));
assertThat(getEffectiveModelSize(modelSizeStats.getModelBytes()), lessThan(ByteSizeValue.ofMb(memoryLimitMb).getBytes() * 1.05));
assertThat(modelSizeStats.getMemoryStatus(), equalTo(ModelSizeStats.MemoryStatus.HARD_LIMIT));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only check hard_limit for robustness.

@valeriy42 valeriy42 merged commit 0a71ff4 into elastic:main Oct 31, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning Team:ML Meta label for the ML team >test Issues or PRs that are addressing/adding tests v9.3.0

Projects

None yet

4 participants