Skip to content

Calculate memory_mb_max_unit based on instance's memory_mb#507

Open
leust wants to merge 1 commit intostable/2023.2-m3from
memory-mb-max-unit-improve
Open

Calculate memory_mb_max_unit based on instance's memory_mb#507
leust wants to merge 1 commit intostable/2023.2-m3from
memory-mb-max-unit-improve

Conversation

@leust
Copy link

@leust leust commented Aug 9, 2024

Previously the memory_mb_max_unit was relying on the overallMemoryUsage reported by the ESXi, which was wrong, because the host could still have a big VM with reserved memory, that was not consuming any RAM and thus not reflected in the memory usage.

We are now looking for the instances spawned on that ESXi and subtract their configured memory_mb from the total RAM capacity of the host, resulting in a realistic memory_mb_max_unit.

Change-Id: I72821c97d80f9f675dca943d65f94e36824d81b6

@leust leust force-pushed the memory-mb-max-unit-improve branch from 20c85fe to ee181da Compare August 9, 2024 14:02
@leust leust force-pushed the memory-mb-max-unit-improve branch from ee181da to c5798a3 Compare August 16, 2024 14:44
@leust leust marked this pull request as ready for review August 16, 2024 14:52
Copy link

@joker-at-work joker-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we report max_unit for memory_mb to be smaller than what a VM currently already has, does Placement still support live-migration off that host? I remember us patching Placement to allow for switching over resources to another resource-provider, but I also remember that recently seeing some problems with max_unit. Can you please check if we need to patch Placement to not do the max_unit check?

The thing I don not like about this, is that we duplicate the "valid host" checking (inMaintenance, connection_state). I would prefer - maybe as a follow-up - if we could have that in only one place. Otherwise, e.g. when we add a check for hosts going into MM in addition to already being in MM, we might miss a place where this is checked.

@leust leust force-pushed the memory-mb-max-unit-improve branch from c5798a3 to 95ff72c Compare September 25, 2024 13:42
@leust leust force-pushed the memory-mb-max-unit-improve branch from 95ff72c to c46674a Compare April 3, 2025 07:10
@leust leust changed the base branch from stable/xena-m3 to stable/2023.2-m3 April 3, 2025 07:10
Comment on lines +443 to +445
memory_mb_max_unit = max(
mem for mem in
self._vmops.get_available_memory_per_host().values())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem for mem in looks superfluous here

Comment on lines +4569 to +4570
(host_mors, _) = vm_util.get_hosts_and_reservations_for_cluster(
self._session, self._cluster)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we re-fetch this instead of using the information already gather when we get called in _get_cluster_stats()? I guess because we do not save the morefs of our hosts anywhere and they are only available inside VCState.upate_status(). Might be worth mentioning or changing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored this function to rely on the previously fetched hosts information.

host_ref_value = vutil.get_moref_value(obj.obj)
hardware_summary = host_props.get("summary.hardware")
mem_size = getattr(hardware_summary, "memorySize", 0)
ram_per_host[host_ref_value] = mem_size // units.Mi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reduce the memory by the reservations? We get the reservations above in vm_util.get_hosts_and_reservations_for_cluster(), but ignore them. Default is to reserve 5% on every host. We support percentage-based admission control. Maybe this could become relevant?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now accounts for the reserved memory as well.

Previously the memory_mb_max_unit was relying on the
overallMemoryUsage reported by the ESXi, which was wrong, because
the host could still have a big VM with reserved memory, that was
not consuming any RAM and thus not reflected in the memory usage.

We are now looking for the instances spawned on that ESXi and
subtract their configured memory_mb from the total RAM capacity
of the host, resulting in a realistic memory_mb_max_unit.

Change-Id: I72821c97d80f9f675dca943d65f94e36824d81b6
@leust leust force-pushed the memory-mb-max-unit-improve branch from c46674a to 5186c82 Compare June 6, 2025 11:46
@leust
Copy link
Author

leust commented Jun 6, 2025

Added a support for a new trait called CUSTOM_HANA_MANUAL_SCHEDULING that's being checked in the HANA filter and bin pack weigher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants