This directory contains deployment tools and scripts for setting up EC2 instances and OpenShift clusters for development purposes.
If you are using the EC2 hypervisor option you will need to have the AWS CLI configured and the AWS_PROFILE environment variable set.
For getting and configuring the CLI: https://docs.aws.amazon.com/cli/
You can check if you have the AWS CLI properly configured by running:
$ aws configure list
Name Value Type Location
---- ----- ---- --------
profile openshift-dev env ['AWS_PROFILE', 'AWS_DEFAULT_PROFILE']
access_key ****************4SU3 shared-credentials-file
secret_key ****************z0DF shared-credentials-file
region us-east-2 config-file ~/.aws/configThe following programs must be present in your local environment:
- make
- aws
- jq
- rsync
- golang
- ansible
For automatic Redfish Pacemaker configuration on 4.19, you also need:
- Python3 kubernetes library (https://pypi.org/project/kubernetes/)
Additionally, if you're using Mac OS, you might not have timeout, so you might also need to install coreutils, for example via brew:
brew install coreutils
Before deployment, configure your environment by setting up the aws-hypervisor/instance.env file. Copy aws-hypervisor/instance.env.template to aws-hypervisor/instance.env and set all variables to valid values for your user.
To see all available commands, run:
$ make help# Create, initialize, and update inventory for a new EC2 instance
$ make deployThis will create the instance, initialize it, and update the inventory in one command, placing you in a login shell for the EC2 instance.
For quickly reusing an existing instance with a fresh cluster deployment:
# Force stop the instance (bypasses cluster checks)
$ make force-stop
# Start the instance
$ make start
# Deploy a fresh cluster
$ make redeploy-clusterThis sequence is the fastest way to reset and reuse an instance when cluster preservation is not needed.
# Create new EC2 instance
$ make create
# Initialize deployed instance
$ make init
# Update inventory.ini with current instance IP
$ make inventory
# SSH into the EC2 instance
$ make ssh
# Get instance info
$ make info
# Start a stopped instance
$ make start
# Stop a running instance (with cluster management options)
$ make stop
# Force stop instance immediately (no cluster checks)
$ make force-stop
# Completely destroy the instance
$ make destroyWhen running OpenShift clusters on the instance (using dev-scripts), you have several options for managing cluster lifecycle:
Quick deployment commands:
make fencing-ipi,make fencing-agent,make arbiter-ipi,make arbiter-agent,make fencing-kcli,make arbiter-kcliprovide non-interactive deployment for specific topologies- These commands automatically call the underlying setup.yml playbook with the appropriate configuration
- Useful for automation and when you know exactly which topology you want to deploy
# Force stop the instance immediately (no prompts, cluster will be lost)
$ make force-stop
# Start the instance
$ make start
# Redeploy the cluster from scratch
$ make redeploy-clusterThis is the recommended workflow for quickly reusing an instance when you don't need to preserve cluster state.
Alternative forcible stop method:
# Using interactive stop with forcible option
$ make stop
# Choose option 1 when prompted for forcible stop# Redeploy the cluster (clean existing and rebuild)
$ make redeploy-clusterThis option:
- Automatically cleans up the existing cluster
- Supports interactive mode selection (arbiter or fencing)
- Preserves the original installation method (IPI or Agent) from the previous deployment
- Intelligently detects cluster topology changes
- For same topology: Uses make redeploy (fast, preserves cached data)
- For topology changes: Uses make realclean + full installation (slower but clean)
- Integrates with Ansible playbooks for orchestration
Installation method preservation:
The redeploy command reads the installation method from the cluster state file (aws-hypervisor/instance-data/cluster-vm-state.json) and uses the same method for redeployment. If you originally deployed with make deploy fencing-agent, redeploy will use make agent. If you deployed with make deploy fencing-ipi, it will use make all (IPI).
To manually override the installation method during redeploy, you can edit the state file before running redeploy:
# Check current method
jq '.installation_method' deploy/aws-hypervisor/instance-data/cluster-vm-state.json
# Change to AGENT (if needed)
jq '.installation_method = "AGENT"' deploy/aws-hypervisor/instance-data/cluster-vm-state.json > /tmp/state.json && \
mv /tmp/state.json deploy/aws-hypervisor/instance-data/cluster-vm-state.jsonWhen to use redeploy:
- When you want to refresh the cluster with the latest changes
- When the cluster is in an inconsistent state
- For testing deployment changes
- When switching between cluster modes (arbiter ↔ fencing)
# Delete the cluster and clean the server
$ make clean
# Stop the instance
$ make stop
# When restarted, you'll need to redeploy the cluster from scratch
$ make start
# Quick deployment over clean server
$ make fencing-ipi # Deploy fencing topology (IPI method)
$ make fencing-agent # Deploy fencing topology (Agent method) (WIP Experimental)
$ make fencing-kcli # Deploy fencing topology (kcli method)
$ make arbiter-ipi # Deploy arbiter topology (IPI method)
$ make arbiter-agent # Deploy arbiter topology (Agent method)
$ make arbiter-kcli # Deploy arbiter topology (kcli method)# Gracefully shutdown the cluster VMs before stopping the instance
$ make shutdown-cluster
# Stop the instance (cluster VMs are preserved in shutdown state)
$ make stop
# Start the instance again
$ make start
# Start up the cluster VMs and proxy container
$ make startup-clusterImportant: "Clean" operations delete the cluster completely. All cluster data, configurations, and workloads will be permanently lost.
# Standard clean: Remove cluster while preserving cached data for faster redeployment
$ make clean
# Full clean: Complete cleanup including all cached data (slower but thorough)
$ make full-cleanWhen to Use:
make clean: Standard cluster cleanup while preserving cached data for faster subsequent deploymentsmake full-clean: Complete cleanup when you want to start completely fresh or troubleshoot deployment issues
When to Use Each Method:
- Redeploy: For changing configurations, updating cluster deployment, switching cluster modes
- Delete and Clean: For planned maintenance, manual control over cleanup
- Force Stop: For instance reuse with cluster reinstallation, when cluster is corrupted, or when cluster preservation is not needed
- Shutdown/Startup: For cluster state preservation
When running make stop on an instance with a running OpenShift cluster, you'll be presented with options:
- Shutdown the cluster VMs: Runs
make shutdown-clusterfirst (recommended) - Delete cluster and clean server: Runs Ansible cleanup playbook
- Continue with forcible stop: Stops instance immediately (cluster lost)
The script automatically detects:
- OpenShift dev-scripts installations
- Running cluster VMs
- Cluster state and provides appropriate options
After restarting an instance with make start, you'll see guidance for:
If you previously shutdown your cluster:
# Start up the existing cluster
$ make startup-clusterIf you need to create or redeploy a cluster:
# Option 1: Automated redeploy with mode selection
$ make redeploy-cluster
# Option 2: Manual clean and setup approach
$ make clean
$ make arbiter-ipi #(for example)# Build resource-agents RPM and patch all cluster nodes
$ make patch-nodes
# Collect cluster logs from all VMs
$ make get-tnf-logsIf cluster startup fails:
# Check cluster status manually
$ make ssh
$ cd ~/openshift-metal3/dev-scripts
$ oc --kubeconfig=ocp/<cluster-name>/auth/kubeconfig get nodes
# If cluster is unrecoverable, clean and redeploy
$ make redeploy-clusterIf VMs don't start properly:
# Check VM states manually
$ make ssh
$ sudo virsh list --all
$ sudo virsh domstate <vm-name>
# Manually start VMs if needed
$ sudo virsh start <vm-name>If proxy container issues occur:
# Check proxy container status
$ make ssh
$ podman ps --filter name=external-squid
# Restart proxy if needed
$ podman restart external-squidCluster State Tracking:
- State saved in
aws-hypervisor/instance-data/cluster-vm-state.json - Tracks deployment topology (arbiter/fencing) and installation method (IPI/Agent)
- Preserves installation method across redeploys
- Detects configuration changes for intelligent cleanup
VM Infrastructure Management:
- Automatic detection of VM configuration changes
- Safe cleanup when switching between cluster types
- Preservation of VM infrastructure when possible
Proxy Container Management:
- Automatic proxy container lifecycle management
- Integration with cluster startup/shutdown workflows
- Status checking and recovery capabilities