-
Notifications
You must be signed in to change notification settings - Fork 2
Post Migration Announce #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gardenlinux
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,8 +20,9 @@ use log::{debug, error, info, trace}; | |
| #[cfg(not(fuzzing))] | ||
| use net_util::virtio_features_to_tap_offload; | ||
| use net_util::{ | ||
| CtrlQueue, MacAddr, NetCounters, NetQueuePair, OpenTapError, RxVirtio, Tap, TapError, TxVirtio, | ||
| VirtioNetConfig, build_net_config_space, build_net_config_space_with_mq, open_tap, | ||
| CtrlQueue, MAC_ADDR_LEN, MacAddr, NetCounters, NetQueuePair, OpenTapError, RxVirtio, Tap, | ||
| TapError, TxVirtio, VirtioNetConfig, build_net_config_space, build_net_config_space_with_mq, | ||
| open_tap, vnet_hdr_len, | ||
| }; | ||
| use seccompiler::SeccompAction; | ||
| use serde::{Deserialize, Serialize}; | ||
|
|
@@ -40,6 +41,7 @@ use super::{ | |
| EpollHelperHandler, Error as DeviceError, RateLimiterConfig, VirtioCommon, VirtioDevice, | ||
| VirtioDeviceType, VirtioInterruptType, | ||
| }; | ||
| use crate::device::PostMigrationAnnouncer; | ||
| use crate::seccomp_filters::Thread; | ||
| use crate::thread_helper::spawn_virtio_thread; | ||
| use crate::{GuestMemoryMmap, VirtioInterrupt}; | ||
|
|
@@ -655,6 +657,38 @@ impl Net { | |
| pub fn wait_for_epoll_threads(&mut self) { | ||
| self.common.wait_for_epoll_threads(); | ||
| } | ||
|
|
||
| fn build_rarp_announce(&self) -> [u8; 60] { | ||
| const ETH_P_RARP: u16 = 0x8035; // Ethertype RARP | ||
| const ARP_HTYPE_ETH: u16 = 0x1; // Hardware type Ethernet | ||
| const ARP_PTYPE_IP: u16 = 0x0800; // Protocol type IPv4 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to do the same for the ARP equivalent of IPv6?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I know this works for IPv4 and IPv6. We never put any IP addresses into the packet, only MAC addresses.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But IPv6 doesn't use ARP AFAICT. ipv6 uses the Neighbor Discovery Protocol (NDP). I am not 100% with that, but we should clarify and make sure your PR reaches your desired network discovery speedup if guest VMs only uses IPv6!
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also the IPv6 message would need the IPv6 Address of the Link, and we have no way of knowing this address. Also this is mainly for the switches in the network, which only need the MAC addresses.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thought is: When the VM has an IPv6, and someone wants to connect the VM right after migration, some network node will use NDP (not ARP!) to ask "who has I think: After a live migration, switches and routers may still have the VM’s old MAC-to-port mappings, causing traffic drops. For IPv4, sending "hello" ARP updates the ARP caches in the network. For IPv6, you similarly need to send NDP messages so that neighbor caches in all relevant nodes and switches learn the VM's new location.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For Layer 2 switch learning RARP is enough, because the switches only care about MAC addresses. For Layer 3 neighbor learning, we would always need an IP address. We do not have those, so we can only fix the Layer 2 switches.
This comment was marked as outdated.
Sorry, something went wrong.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you convinced me :) seems about right. |
||
| const ARP_OP_REQUEST_REV: u16 = 0x0003; // RARP Request opcode | ||
|
|
||
| const IPV4_ADDR_LENGTH: usize = 4; // Size of an IPv4 address | ||
|
|
||
| let mut buf = [0u8; 60]; | ||
|
|
||
| // Ethernet header | ||
| buf[0..6].copy_from_slice(&[0xff; MAC_ADDR_LEN]); // This is a broadcast | ||
| buf[6..12].copy_from_slice(&self.config.mac); // Src is this NIC | ||
| buf[12..14].copy_from_slice(Ð_P_RARP.to_be_bytes()); // This is a RARP packet | ||
|
|
||
| // ARP Header | ||
| buf[14..16].copy_from_slice(&ARP_HTYPE_ETH.to_be_bytes()); | ||
| buf[16..18].copy_from_slice(&ARP_PTYPE_IP.to_be_bytes()); | ||
| buf[18] = MAC_ADDR_LEN as u8; // Hardware address length (ethernet) | ||
| buf[19] = IPV4_ADDR_LENGTH as u8; // Protocol address length (IPv4) | ||
| // This is a "fake RARP" packet, we don't want to perform a real RARP lookup. | ||
| // Thus the content of the next fields is largely irrelevant. Setting sha = tha | ||
| // is fine according to RFC 903. | ||
| buf[20..22].copy_from_slice(&ARP_OP_REQUEST_REV.to_be_bytes()); | ||
| buf[22..28].copy_from_slice(&self.config.mac); // Source hardware address | ||
| buf[28..32].copy_from_slice(&[0x00; IPV4_ADDR_LENGTH]); // Source protocol address | ||
| buf[32..38].copy_from_slice(&self.config.mac); // Target hardware address | ||
| buf[38..42].copy_from_slice(&[0x00; IPV4_ADDR_LENGTH]); // Target protocol address | ||
|
|
||
| buf | ||
| } | ||
| } | ||
|
|
||
| impl Drop for Net { | ||
|
|
@@ -870,6 +904,13 @@ impl VirtioDevice for Net { | |
| fn set_access_platform(&mut self, access_platform: Arc<dyn AccessPlatform>) { | ||
| self.common.set_access_platform(access_platform); | ||
| } | ||
|
|
||
| fn post_migration_announcer(&mut self) -> std::option::Option<Box<dyn PostMigrationAnnouncer>> { | ||
| Some(Box::new(TapRarpAnnouncer::new( | ||
| self.build_rarp_announce(), | ||
| self.taps.clone(), | ||
| ))) | ||
| } | ||
| } | ||
|
|
||
| impl Pausable for Net { | ||
|
|
@@ -898,3 +939,32 @@ impl Snapshottable for Net { | |
| } | ||
| impl Transportable for Net {} | ||
| impl Migratable for Net {} | ||
|
|
||
| pub struct TapRarpAnnouncer { | ||
| announce: [u8; 60], | ||
| taps: Vec<Tap>, | ||
| } | ||
|
|
||
| impl TapRarpAnnouncer { | ||
| pub fn new(announce: [u8; 60], taps: Vec<Tap>) -> Self { | ||
| Self { announce, taps } | ||
| } | ||
| } | ||
|
|
||
| impl PostMigrationAnnouncer for TapRarpAnnouncer { | ||
| fn announce_once(&self) { | ||
| // We have to add a virtio-net header to the announce. | ||
| let mut buf = vec![0u8; vnet_hdr_len() + self.announce.len()]; | ||
| buf[vnet_hdr_len()..].copy_from_slice(&self.announce); | ||
|
|
||
| for tap in &self.taps { | ||
| let _ = unsafe { | ||
| libc::write( | ||
| tap.as_raw_fd(), | ||
| buf.as_ptr() as *const libc::c_void, | ||
| buf.len(), | ||
| ) | ||
| }; | ||
|
Comment on lines
+961
to
+967
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the function signature it seems like there is in principle nothing stopping this from being called in parallel from multiple threads. Is that safe?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it is missing a It is safe, but we need a nice compact comment explaining why it is :) good point Oliver! |
||
| } | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1847,6 +1847,7 @@ impl Vmm { | |
| // The unwrap is safe, because the state machine makes sure we called | ||
| // vm_receive_state before, which creates the VM. | ||
| let vm = self.vm.vm_mut().unwrap(); | ||
| vm.announce(); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am very surprised about this fully-blown internal abstraction. I think this can be build on the existing Before I review this further, we should discuss this fundamental design discussion.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can put this somewhere else, I can also put this into the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is, but I don't see how these packets could be harmful and why we need an extra condition whether to send them or not. Also, the traffic is negligible. Right?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah there are only a few packets per NIC, so I don't think we cause harm by adding it to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I am not against using
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @olivereanderson you mean adding code to the |
||
| vm.resume()?; | ||
| Ok(Completed) | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use
Optionand not the fully qualified path