-
Notifications
You must be signed in to change notification settings - Fork 2
Post Migration Announce #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gardenlinux
Are you sure you want to change the base?
Conversation
|
Could you please rephrase your PR title, the commits, and the corresponding internals to something more descriptive than "post migration announce"? Something like "Post-migration ARP Advertisment Packets"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Before reviewing this further, we should discuss if the proposed design is something we want to go with. I prefer to use the existing resume() path. Bonus point: it would also work after a cold migration.
Are there any implications I'm missing here by chance?
PS: You also need to do the same for the Neighbor Discovery Protocol (ARP equivalent in IPv6).
virtio-devices/src/device.rs
Outdated
|
|
||
| /// Some devices can announce their location after a live migration to | ||
| /// speed up normal execution. | ||
| fn post_migration_announcer(&mut self) -> std::option::Option<Box<dyn PostMigrationAnnouncer>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Option and not the fully qualified path
| fn build_rarp_announce(&self) -> [u8; 60] { | ||
| const ETH_P_RARP: u16 = 0x8035; // Ethertype RARP | ||
| const ARP_HTYPE_ETH: u16 = 0x1; // Hardware type Ethernet | ||
| const ARP_PTYPE_IP: u16 = 0x0800; // Protocol type IPv4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to do the same for the ARP equivalent of IPv6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know this works for IPv4 and IPv6. We never put any IP addresses into the packet, only MAC addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But IPv6 doesn't use ARP AFAICT. ipv6 uses the Neighbor Discovery Protocol (NDP).
I am not 100% with that, but we should clarify and make sure your PR reaches your desired network discovery speedup if guest VMs only uses IPv6!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the IPv6 message would need the IPv6 Address of the Link, and we have no way of knowing this address. Also this is mainly for the switches in the network, which only need the MAC addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought is: When the VM has an IPv6, and someone wants to connect the VM right after migration, some network node will use NDP (not ARP!) to ask "who has $myipv6". Therefore, we also need to populate NDP caches in all switches and not just ARP - correct? (Again, I am not 100% sure, but it seems very plausible to me!).
I think: After a live migration, switches and routers may still have the VM’s old MAC-to-port mappings, causing traffic drops. For IPv4, sending "hello" ARP updates the ARP caches in the network. For IPv6, you similarly need to send NDP messages so that neighbor caches in all relevant nodes and switches learn the VM's new location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Layer 2 switch learning RARP is enough, because the switches only care about MAC addresses.
For Layer 3 neighbor learning, we would always need an IP address. We do not have those, so we can only fix the Layer 2 switches.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you convinced me :) seems about right.
vmm/src/lib.rs
Outdated
| // The unwrap is safe, because the state machine makes sure we called | ||
| // vm_receive_state before, which creates the VM. | ||
| let vm = self.vm.vm_mut().unwrap(); | ||
| vm.announce(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very surprised about this fully-blown internal abstraction. I think this can be build on the existing resume() mechanism that Cloud Hypervisor already has?
Before I review this further, we should discuss this fundamental design discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can put this somewhere else, I can also put this into the resume function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't resume also used for when you just pause the VM which does not necessarily have to entail a live migration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, but I don't see how these packets could be harmful and why we need an extra condition whether to send them or not. Also, the traffic is negligible. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there are only a few packets per NIC, so I don't think we cause harm by adding it to resume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you measured the costs of sending these packets? is it worth it to do this in a dedicated thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that it has to happen periodic with a delay in between, I don't know how to do that if not with a separate thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, makes sense to have a dedicated thread then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the post migration announce has to happen more than just once, this is the reason why I start a thread that sends the announcements a few more times. I think that wouldn't really work with your solution? At least I don't see how to make it work.
The fact that the method was named announce_once made me initially think that you only announce once per device. If you indeed need to do more announcements per device, then there are ways to do that with what I outlined, but it is not necessarily better than what you already implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scheduling the post migration announcers is now done as part of DeviceManager::resume()
| let _ = unsafe { | ||
| libc::write( | ||
| tap.as_raw_fd(), | ||
| buf.as_ptr() as *const libc::c_void, | ||
| buf.len(), | ||
| ) | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the function signature it seems like there is in principle nothing stopping this from being called in parallel from multiple threads. Is that safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is missing a // SAFETY comment (see cargo clippy).
It is safe, but we need a nice compact comment explaining why it is :)
good point Oliver!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the // SAFETY comment. I also removed Sync and made self &mut self, to signal that it is not intended to call this function in parallel from multiple threads.
| let mut buf = [0u8; 60]; | ||
|
|
||
| // Ethernet header | ||
| buf[0..6].copy_from_slice(&[0xff; MAC_ADDR_LEN]); // This is a broadcast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you used this as templat?
https://gitlab.com/qemu-project/qemu/-/blob/master/net/announce.c#L98-121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked whether qemu does the same, but during implementation I looked at https://datatracker.ietf.org/doc/html/rfc903 and what chatgpt told me.
c0f9be6 to
92f7fbf
Compare
virtio-devices/src/device.rs
Outdated
| pub trait PostMigrationAnnouncer: Send { | ||
| // Sending the announces is done on a best-effort basis, so we ignore | ||
| // errors. | ||
| fn announce_once(&mut self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change the name to announce?
The trailing once makes me think that it is only intended to be called once (but cannot take self: Self for one reason or another).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I changed it.
Some devices can do announcements after a live migration, e.g. a network device announces its new location in a network. With this change a device can return an announcer to do that. On-behalf-of: SAP [email protected] Signed-off-by: Sebastian Eydam <[email protected]>
To announce the new location in a network, virtio-net devices can send reverse ARP packets. On-behalf-of: SAP [email protected] Signed-off-by: Sebastian Eydam <[email protected]>
After this change switches in a network should find the new location of a VM quicker. On-behalf-of: SAP [email protected] Signed-off-by: Sebastian Eydam <[email protected]>
This PR adds a post migration announce to Cloud Hypervisor. After a successful live migration, Cloud Hypervisor now sends reverse ARP packets on all virtio-net devices. That way switches should learn the new location of the VM faster.