Skip to content

On Platform BareMetal UserManaged load balancer configuration are not rendered in AgentClusterInstall #10108

@mjovanovic0

Description

@mjovanovic0

When deploying openshift cluster with platform: baremetal you cannot have same VIP address for both API and Ingress even when setting loadbalancer: type: UserManaged.

For example, following snippet in install-config.yaml:

platform:
  baremetal:
    apiVIPs:
      - 10.0.10.100
    ingressVIPs:
      - 10.0.10.100

will produce validation error:

$ openshift-install-linux agent create image
INFO Configuration has 3 master replicas, 0 arbiter replicas, and 4 worker replicas 
ERROR failed to write asset (Agent Installer ISO) to disk: cannot generate ISO image due to configuration errors 
FATAL failed to fetch Agent Installer ISO: failed to load asset "Install Config": invalid install-config configuration: platform.baremetal.apiVIPs: Invalid value: "10.0.10.100": VIP for API must not be one of the Ingress VIPs

Which is expected. But uppon further investigation of the installer code:
https://github.com/openshift/installer/blob/release-4.20/pkg/types/validation/installconfig.go#L971

If you update snippet in install-config.yaml to:

platform:
  baremetal:
    loadBalancer:
      type: UserManaged
    apiVIPs:
      - 10.0.10.100
    ingressVIPs:
      - 10.0.10.100

Then install succesfully generate agent.x86_64.iso file that you can boot up on the node(s).

$ openshift-install-linux agent create image
INFO Configuration has 3 master replicas, 0 arbiter replicas, and 4 worker replicas 
INFO The rendezvous host IP (node0 IP) is 10.0.10.10
INFO Extracting base ISO from release payload     
INFO Verifying cached file                        
INFO Using cached Base ISO /root/.cache/agent/image_cache/coreos-x86_64.iso 
INFO Consuming Install Config from target directory 
INFO Consuming Agent Config from target directory 
INFO Generated ISO at agent.x86_64.iso.   

But when that ISO image is booted on the node, the assisted-service reports following validation violation:

  • The IP address "10.0.10.100" appears both in apiVIPs and ingressVIPs

https://github.com/openshift/assisted-service/blob/c616cdc/internal/network/machine_network_cidr.go#L254

Uppon further digging around assisted-service codebase, there are code paths to have UserManaged load balancer that will not enforce different VIPs for both API and Ingress.
https://github.com/openshift/assisted-service/blob/c616cdc/internal/cluster/validations/validations.go#L525

It checks the file on node /etc/assisted/manifests/agent-cluster-install.yaml if

apiVersion: extensions.hive.openshift.io/v1beta1
kind: AgentClusterInstall
spec:
...
  platformType: BareMetal
  loadBalancer:
    type: UserManaged

is defined inside, but that stanza is not defined even though it was defined in initial install-config.yaml (probably not copied during rendering of manifests in installer iso image creation process).

It seems that in this place it's missing copy of load balancer definition into AgentClusterInstall manifest:
https://github.com/openshift/installer/blob/release-4.20/pkg/asset/agent/manifests/agentclusterinstall.go#L307

Version of openshift-installer:

openshift-install-linux 4.20.1
built from commit e23807689ec464da30e771dda70fd8989680a011
release image quay.io/openshift-release-dev/ocp-release@sha256:cbde13fe6ed4db88796be201fbdb2bbb63df5763ae038a9eb20bc793d5740416
release architecture amd64

Version of assisted-service:
openshift/assisted-service@c616cdc

Also, I'm willing to create PR for this if this is accepted as bug/improvement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions