Skip to content

Commit 737e0c0

Browse files
committed
tests: Add reproducer for bug #1889633
With the introduction of the cpu-resources work [1], (libvirt) hosts can now report 'PCPU' inventory separate from 'VCPU' inventory, which is consumed by instances with pinned CPUs ('hw:cpu_policy=dedicated'). As part of that effort, we had to drop support for the ability to boot instances with 'hw:cpu_thread_policy=isolate' (i.e. I don't want hyperthreads) on hosts with hyperthreading. This had been previously implemented by marking thread siblings of the host cores used by such an instance as reserved and unusable by other instances, but such a design wasn't possible in world where we had to track resource consumption in placement before landing in the host. Instead, the 'isolate' policy now simply means "give me a host without hyperthreads". This is enforced by hosts with hyperthreads reporting the 'HW_CPU_HYPERTHREADING' trait, and instances with the 'isolate' policy requesting 'HW_CPU_HYPERTHREADING=forbidden'. Or at least, that's how it should work. We also have a fallback query for placement to find hosts with 'VCPU' inventory and that doesn't care about the 'HW_CPU_HYPERTHREADING' trait. This was envisioned to ensure hosts with old style configuration ('[DEFAULT] vcpu_pin_set') could continue to be scheduled to. We figured that this second fallback query could accidentally pick up hosts with new-style configuration, but we are also tracking the available and used cores from those listed in the '[compute] cpu_dedicated_set' as part of the host 'NUMATopology' objects (specifically, via the 'pcpuset' and 'cpu_pinning' fields of the 'NUMACell' child objects). These are validated by both the 'NUMATopologyFilter' and the virt driver itself, which means hosts with new style configuration that got caught up in this second query would be rejected by this filter or by a late failure on the host. (Hint: there's much more detail on this in the spec). Unfortunately we didn't think about hyperthreading. If a host gets picked up in the second request, it might well have enough PCPU inventory but simply be rejected in the first query since it had hyperthreads. In this case, because it has enough free cores available for pinning, neither the filter nor the virt driver will reject the request, resulting in a situation whereby the instance ends up falling back to the old code paths and consuming $flavor.vcpu host cores, plus the thread siblings for each of these cores. Despite this, it will be marked as consuming $flavor.vcpu VCPU (not PCPU) inventory in placement. This patch proves this to be the case, allowing us to resolve the issue later. [1] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/cpu-resources.html Change-Id: I87cd4d14192b1a40cbdca6e3af0f818f2cab613e Signed-off-by: Stephen Finucane <[email protected]> Related-Bug: #1889633
1 parent 91b9481 commit 737e0c0

File tree

1 file changed

+71
-0
lines changed

1 file changed

+71
-0
lines changed

nova/tests/functional/libvirt/test_numa_servers.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,46 @@ def test_create_server_with_legacy_pinning_policy_old_configuration(self):
257257

258258
self._run_build_test(flavor_id, expected_usage=expected_usage)
259259

260+
def test_create_server_with_isolate_thread_policy_old_configuration(self):
261+
"""Create a server with the legacy 'hw:cpu_thread_policy=isolate' extra
262+
spec and configuration.
263+
264+
This should pass and result in an instance consuming $flavor.vcpu host
265+
cores plus the thread sibling(s) of each of these cores. We also be
266+
consuming VCPUs since we're on legacy configuration here, though that
267+
would in theory be fixed during a later reshape.
268+
"""
269+
self.flags(
270+
cpu_dedicated_set=None, cpu_shared_set=None, group='compute')
271+
self.flags(vcpu_pin_set='0-3')
272+
273+
# host has hyperthreads, which means we're going to end up consuming
274+
# $flavor.vcpu hosts cores plus the thread sibling(s) for each core
275+
host_info = fakelibvirt.HostInfo(
276+
cpu_nodes=1, cpu_sockets=1, cpu_cores=2, cpu_threads=2,
277+
kB_mem=(1024 * 1024 * 16), # GB
278+
)
279+
fake_connection = self._get_connection(host_info=host_info)
280+
self.mock_conn.return_value = fake_connection
281+
282+
extra_spec = {
283+
'hw:cpu_policy': 'dedicated',
284+
'hw:cpu_thread_policy': 'isolate',
285+
}
286+
flavor_id = self._create_flavor(vcpu=2, extra_spec=extra_spec)
287+
288+
expected_usage = {'DISK_GB': 20, 'MEMORY_MB': 2048, 'VCPU': 2}
289+
self._run_build_test(flavor_id, expected_usage=expected_usage)
290+
291+
# verify that we have consumed two cores plus the thread sibling of
292+
# each core, totalling four cores since the HostInfo indicates each
293+
# core should have two threads
294+
ctxt = nova_context.get_admin_context()
295+
host_numa = objects.NUMATopology.obj_from_db_obj(
296+
objects.ComputeNode.get_by_nodename(ctxt, 'compute1').numa_topology
297+
)
298+
self.assertEqual({0, 1, 2, 3}, host_numa.cells[0].pinned_cpus)
299+
260300
def test_create_server_with_legacy_pinning_policy_fails(self):
261301
"""Create a pinned instance on a host with no PCPUs.
262302
@@ -319,6 +359,37 @@ def test_create_server_with_legacy_pinning_policy_quota_fails(self):
319359
self.api.post_server, post)
320360
self.assertEqual(403, ex.response.status_code)
321361

362+
def test_create_server_with_isolate_thread_policy_fails(self):
363+
"""Create a server with the legacy 'hw:cpu_thread_policy=isolate' extra
364+
spec.
365+
366+
This should fail on a host with hyperthreading.
367+
"""
368+
self.flags(
369+
cpu_dedicated_set='0-3', cpu_shared_set='4-7', group='compute')
370+
self.flags(vcpu_pin_set=None)
371+
372+
# host has hyperthreads, which means it should be rejected
373+
host_info = fakelibvirt.HostInfo(
374+
cpu_nodes=2, cpu_sockets=1, cpu_cores=2, cpu_threads=2,
375+
kB_mem=(1024 * 1024 * 16), # GB
376+
)
377+
fake_connection = self._get_connection(host_info=host_info)
378+
self.mock_conn.return_value = fake_connection
379+
380+
extra_spec = {
381+
'hw:cpu_policy': 'dedicated',
382+
'hw:cpu_thread_policy': 'isolate',
383+
}
384+
flavor_id = self._create_flavor(vcpu=2, extra_spec=extra_spec)
385+
386+
# FIXME(stephenfin): This should go to error status since there should
387+
# not be a host available
388+
expected_usage = {
389+
'DISK_GB': 20, 'MEMORY_MB': 2048, 'PCPU': 0, 'VCPU': 2,
390+
}
391+
self._run_build_test(flavor_id, expected_usage=expected_usage)
392+
322393
def test_create_server_with_pcpu(self):
323394
"""Create a server using an explicit 'resources:PCPU' request.
324395

0 commit comments

Comments
 (0)