-
Notifications
You must be signed in to change notification settings - Fork 99
Description
Description
I am trying to set up vHive on a single node cluster, and get it working by deploying and then invoking the functions, as described in the guide here. I am able to follow through the steps manually, and all the kubernetes pods are running as desired. However, when deploying functions using this link, I ran into some errors.
System Configuration
lscpu output:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 25
Model: 1
Model name: AMD Eng Sample: 100-000000314-02_30/16_N
Stepping: 0
Frequency boost: enabled
CPU MHz: 1600.000
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 3193.90
Virtualization: AMD-V
L1d cache: 4 MiB
L1i cache: 4 MiB
L2 cache: 64 MiB
L3 cache: 512 MiB
NUMA node0 CPU(s): 0-63,128-191
NUMA node1 CPU(s): 64-127,192-255
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_
opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 f
ma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l
3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall erms xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm
_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid de
codeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov suc
cor smca fsrm
cat /etc/os-release output:
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Logs
vHive logs:
time="2022-05-19T12:15:03.479476427Z" level=error msg="coordinator failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to start the VM: [PUT /actions][400] createSyncActionBadRequest &{FaultMessage:Internal error while starting microVM: VcpuConfigure(CpuId(InvalidVendor))}" image="ghcr.io/ease-lab/helloworld:var_workload" vmID=1
time="2022-05-19T12:15:03.479560085Z" level=error msg="failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to start the VM: [PUT /actions][400] createSyncActionBadRequest &{FaultMessage:Internal error while starting microVM: VcpuConfigure(CpuId(InvalidVendor))}"
time="2022-05-19T12:15:03.482077842Z" level=error msg="VM config for pod d021d0b8ad35ac3cc8d9a0f8202e91dbc2c09081413cf2352a27717df00ed033 does not exist"
time="2022-05-19T12:15:03.482101657Z" level=error error="VM config for pod does not exist"
(I get the same issue as #476 initially. I then used the solution proposed on the ticket. Above logs are post application of the solution.)
Notes
There is a similar issue mentioned here. This seems to be a firecracker-containerd issue for non-Intel vendors, which they seem to have fixed later (as per the issue). I am not sure whether the firecracker-containerd binary used in vHive is the latest one. When I clone the latest firecracker-containerd repo, install it, and replace the /vhive/bin/firecracker-containerd binary with the one I built, the vHive error log gets reduced to:
time="2022-05-19T06:31:18.406741266Z" level=error msg="failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to build VM configuration: no such file or directory"
time="2022-05-19T06:31:18.409782917Z" level=error msg="VM config for pod 84c5ce4eb538a061c3f75497a2b9f8688dc4cbfa351478a81691b05e4e59ff43 does not exist"
time="2022-05-19T06:31:18.409806492Z" level=error error="VM config for pod does not exist"
time="2022-05-19T06:31:36.204002382Z" level=warning msg="Failed to Fetch k8s dns clusterIP exit status 1\nThe connection to the server localhost:8080 was refused - did you specify the right host or port?\n\n"
time="2022-05-19T06:31:36.204047106Z" level=warning msg="Using google dns 8.8.8.8\n"
time="2022-05-19T06:31:36.350628233Z" level=error msg="coordinator failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to build VM configuration: no such file or directory" image="vhiveease/rnn_serving:var_workload" vmID=263
I have also gone through #525 and have access to /dev/kvm. Also, I am running on a bare-metal x86_64 amd server running Ubuntu 20.04.
Expected Behavior
Functions should be deployed normally.
Steps to reproduce
Simply follow the start-up guide provided to set up an one-node cluster & then run the deployer.