Skip to content

Deployment of Functions in vHive failing #539

@aditya2803

Description

@aditya2803

Description

I am trying to set up vHive on a single node cluster, and get it working by deploying and then invoking the functions, as described in the guide here. I am able to follow through the steps manually, and all the kubernetes pods are running as desired. However, when deploying functions using this link, I ran into some errors.

System Configuration

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          256
On-line CPU(s) list:             0-255
Thread(s) per core:              2
Core(s) per socket:              64
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       AuthenticAMD
CPU family:                      25
Model:                           1
Model name:                      AMD Eng Sample: 100-000000314-02_30/16_N
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         1600.000
CPU max MHz:                     3000.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        3193.90
Virtualization:                  AMD-V
L1d cache:                       4 MiB
L1i cache:                       4 MiB
L2 cache:                        64 MiB
L3 cache:                        512 MiB
NUMA node0 CPU(s):               0-63,128-191
NUMA node1 CPU(s):               64-127,192-255
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_
                                 opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 f
                                 ma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a
                                  misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l
                                 3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall erms xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm
                                 _local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid de
                                 codeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov suc
                                 cor smca fsrm

cat /etc/os-release output:

NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Logs

vHive logs:

time="2022-05-19T12:15:03.479476427Z" level=error msg="coordinator failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to start the VM: [PUT /actions][400] createSyncActionBadRequest  &{FaultMessage:Internal error while starting microVM: VcpuConfigure(CpuId(InvalidVendor))}" image="ghcr.io/ease-lab/helloworld:var_workload" vmID=1
time="2022-05-19T12:15:03.479560085Z" level=error msg="failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to start the VM: [PUT /actions][400] createSyncActionBadRequest  &{FaultMessage:Internal error while starting microVM: VcpuConfigure(CpuId(InvalidVendor))}"
time="2022-05-19T12:15:03.482077842Z" level=error msg="VM config for pod d021d0b8ad35ac3cc8d9a0f8202e91dbc2c09081413cf2352a27717df00ed033 does not exist"
time="2022-05-19T12:15:03.482101657Z" level=error error="VM config for pod does not exist"

(I get the same issue as #476 initially. I then used the solution proposed on the ticket. Above logs are post application of the solution.)

Notes
There is a similar issue mentioned here. This seems to be a firecracker-containerd issue for non-Intel vendors, which they seem to have fixed later (as per the issue). I am not sure whether the firecracker-containerd binary used in vHive is the latest one. When I clone the latest firecracker-containerd repo, install it, and replace the /vhive/bin/firecracker-containerd binary with the one I built, the vHive error log gets reduced to:

time="2022-05-19T06:31:18.406741266Z" level=error msg="failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to build VM configuration: no such file or directory"
time="2022-05-19T06:31:18.409782917Z" level=error msg="VM config for pod 84c5ce4eb538a061c3f75497a2b9f8688dc4cbfa351478a81691b05e4e59ff43 does not exist"
time="2022-05-19T06:31:18.409806492Z" level=error error="VM config for pod does not exist"
time="2022-05-19T06:31:36.204002382Z" level=warning msg="Failed to Fetch k8s dns clusterIP exit status 1\nThe connection to the server localhost:8080 was refused - did you specify the right host or port?\n\n"
time="2022-05-19T06:31:36.204047106Z" level=warning msg="Using google dns 8.8.8.8\n"
time="2022-05-19T06:31:36.350628233Z" level=error msg="coordinator failed to start VM" error="failed to create the microVM in firecracker-containerd: rpc error: code = Unknown desc = failed to create VM: failed to build VM configuration: no such file or directory" image="vhiveease/rnn_serving:var_workload" vmID=263

I have also gone through #525 and have access to /dev/kvm. Also, I am running on a bare-metal x86_64 amd server running Ubuntu 20.04.

Expected Behavior
Functions should be deployed normally.

Steps to reproduce
Simply follow the start-up guide provided to set up an one-node cluster & then run the deployer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions