-
Notifications
You must be signed in to change notification settings - Fork 99
How to deploy and restart runners
Runners physical node configuration: Three nodes with 4C-8G, 100GB storage. Suggested system image: ubuntu-20.04-2nic
One for integration test, one for gvisor-cri and one for firecracker-cri
How to deploy the three nodes:
- On the node that can access the three runners:
git clone https://github.com/vhive-serverless/vHive.git
- Build the runner deployer
cd vHive/scripts/github_runner/
go build .
- Modify the conf.json
Need to modify conf.json, the format is as following:
{
"ghOrg": "<GitHub account>",
"ghPat": "<GitHub PAT>",
"hostUsername": "<username>",
"runners": {
"<hostname-1>": {
"type": "cri",
"sandbox": "firecracker"
},
"<hostname-2>": {
"type": "cri",
"sandbox": "gvisor",
},
"<hostname-3>": {
"type": "integ",
"num": 2,
"restart": false
}
}
}
Note that in conf.json
, for ghOrg
, it's vhive-serverless
, for ghPat
, it should be your own account's Personal Access Token, as long as your account has the correct permissions for vhive-serverless
org
<username>:<hostname-1/2/3>
is the ssh username and hostname, so if you use SCSE cloud nodes as runners, <hostname-1/2/3>
should be their ip addresses.
After modifying this, deploy the runners remotely by running:
./deploy_runners
On SCSE cloud, rebuild the three nodes and redeploy them.
For firecracker and gvisor cri tests, when the test stuck in helloworld is waiting for a Revision to be ready
This basically implies that the firecracker and gvisor cri runners need to be restart(You can also restart only one runner in that case)
But if the firecracker and gvisor cri test passed the Setup vHive CRI test environment
step and failed in Run vHive CRI tests
step, this typically is just sporadic failure and can be resolved by re-running the tests, just trigger the re-run button on github webpage is okay.