Skip to content

RHEL9: libvirt-sock file not found error during cluster bringup #1657

@pperiyasamy

Description

@pperiyasamy

Describe the bug

OCP cluster installation failed with error:

failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory

To Reproduce

Bring up OCP cluster (4.15 nightly) with following steps:

$ git clone https://github.com/openshift-metal3/dev-scripts && \
cd dev-scripts/
# make
$ diff config_example.sh config_peri.sh
12c12
< export CI_TOKEN=''
---
> export CI_TOKEN='xxxxxxxx'
36,37c36,37
< #
< #export OPENSHIFT_RELEASE_STREAM=4.15
---
> 
> export OPENSHIFT_RELEASE_STREAM=4.15
227c227
< #export IP_STACK=v4
---
> export IP_STACK=v4
294c294
< #export NETWORK_TYPE="OVNKubernetes"
---
> export NETWORK_TYPE="OVNKubernetes
$ cat /etc/os-release 
NAME="Red Hat Enterprise Linux"
VERSION="9.4 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.4 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.4
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.

Expected/observed behavior

level=debug msg=[INFO] running Terraform command: /home/peri/dev-scripts/ocp/ostest/terraform/bin/terraform init -no-color -input=false -backend=true -get=true -upgrade=false -plugin-dir=/home/peri/dev-scripts/ocp/ostest/terraform/plugins
level=debug
level=debug msg=Initializing the backend...
level=debug
level=debug msg=Initializing provider plugins...
level=debug msg=- Finding latest version of openshift/local/ironic...
level=debug msg=- Finding latest version of openshift/local/libvirt...
level=debug msg=- Installing openshift/local/ironic v1.0.0...
level=debug msg=- Installed openshift/local/ironic v1.0.0 (unauthenticated)
level=debug msg=- Installing openshift/local/libvirt v1.0.0...
level=debug msg=- Installed openshift/local/libvirt v1.0.0 (unauthenticated)
level=debug
level=debug msg=Terraform has created a lock file .terraform.lock.hcl to record the provider
level=debug msg=selections it made above. Include this file in your version control repository
level=debug msg=so that Terraform can guarantee to make the same selections by default when
level=debug msg=you run "terraform init" in the future.
level=debug
level=debug
level=debug msg=Warning: Incomplete lock file information for providers
level=debug
level=debug msg=Due to your customized provider installation methods, Terraform was forced to
level=debug msg=calculate lock file checksums locally for the following providers:
level=debug msg=  - openshift/local/ironic
level=debug msg=  - openshift/local/libvirt
level=debug
level=debug msg=The current .terraform.lock.hcl file only includes checksums for linux_amd64,
level=debug msg=so Terraform running on another platform will fail to install these
level=debug msg=providers.
level=debug
level=debug msg=To calculate additional checksums for another platform, run:
level=debug msg=  terraform providers lock -platform=linux_amd64
level=debug msg=(where linux_amd64 is the platform to generate)
level=debug
level=debug msg=Terraform has been successfully initialized!
level=debug msg=[INFO] running Terraform command: /home/peri/dev-scripts/ocp/ostest/terraform/bin/terraform apply -no-color -auto-approve -input=false -var-file=/tmp/openshift-install-bootstrap-3795376676/terraform.tfvars.json -var-file=/tmp/openshift-install-bootstrap-3795376676/terraform.platform.auto.tfvars.json -lock=true -parallelism=10 -refresh=true
level=error
level=error msg=Error: failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory
level=error
level=error msg=  with provider["openshift/local/libvirt"],
level=error msg=  on main.tf line 1, in provider "libvirt":
level=error msg=   1: provider "libvirt" {
level=error
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failure applying terraform for "bootstrap" stage: error applying Terraform configs: failed to apply Terraform: exit status 1
level=error
level=error msg=Error: failed to dial libvirt: dial unix /var/run/libvirt/libvirt-sock: connect: no such file or directory
level=error
level=error msg=  with provider["openshift/local/libvirt"],
level=error msg=  on main.tf line 1, in provider "libvirt":
level=error msg=   1: provider "libvirt" {
level=error
level=error
+(utils.sh:1): create_cluster(): auth_template_and_removetmp
+(utils.sh:866): auth_template_and_removetmp(): echo 4
+(utils.sh:867): auth_template_and_removetmp(): generate_auth_template
+(utils.sh:327): generate_auth_template(): set +x
E0502 06:48:12.764378   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:15.836414   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:18.908310   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:21.980273   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
E0502 06:48:25.052182   73376 memcache.go:265] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
Unable to connect to the server: dial tcp 192.168.111.5:6443: connect: no route to host

Additional context

The following change in configure host script fixes the problem.

$ git diff
diff --git a/02_configure_host.sh b/02_configure_host.sh
index 4f1ef60..f40d14f 100755
--- a/02_configure_host.sh
+++ b/02_configure_host.sh
@@ -31,6 +31,7 @@ manage_libvirtd() {
           sudo systemctl restart libvirtd.service
         ;;
 esac
+sudo systemctl restart libvirtd.service
 }
 
 # Generate user ssh key

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions