Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,10 @@ help:
@echo 'make down-v'
@echo 'kubectl taint nodes --all node-role.kubernetes.io/control-plane-'

.PHONY: multi-node
multi-node:
sed -i "s/default_network/$(HOSTNAME)/g" docker-compose.yaml

.PHONY: check-preflight
check-preflight:
./Makefile.d/check-preflight.sh
Expand Down
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,17 @@ pasta does not seem to work well
Use scripts in [`./init-host`](./init-host) for automating these steps.

## Usage

See `make help`.

If you are running a multi-node setup with a shared filesystem and location for your network CNI files, you will want to create a non-shared location for each node's usernetes code (e.g., `/tmp` is usually not shared) and run this additional command for each of the control-plane and worker nodes before `make up`. It will give the network (and corresponding CNI files) unique names in the shared location, usually in `~/.config/cni`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that container engines support locating CNI files on a shared filesystem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest just mounting a local filesystem on .config/cni

Copy link
Contributor Author

@vsoch vsoch Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest just mounting a local filesystem on .config/cni

You mean on the HPC node? On top of NFS, and for every user? That seems overkill for what comes down to a file naming issue.

The solution here does not change functionality for a user that doesn't need this change, but supports multi-node shared filesystem setups for users that need it with an isolated make multi-node command. If there turns out to be other multi-node functionality that is needed, it could be added to that section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that container engines support locating CNI files on a shared filesystem

In rootless mode, podman puts the cni files in the user's home. To be clear, it isn't shared between users, it is shared between nodes. reference,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as:

I don't think a new Makefile target should be added for this.
docker-compose.yaml can be modified in vi or yq.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for an HPC cluster of hundreds or thousands of nodes, you want the user to manually update the file with vim?

You requested changes on the PR - can you please clarify what I can change? It seems more you are rejecting any kind of change for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNI files aren't expected to be shared between nodes.

If you aren't allowed to mount local filesystems, as a workaround you can just automate updating YAMLs with yq https://github.com/mikefarah/yq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CNI files aren't expected to be shared between nodes.

In a rootless environment with Podman, where they are stored in ~/.config in the user's home (that is mounted and shared across compute nodes) it is not just expected, it is guaranteed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are expected/guaranteed to be under the home, but not expected to be under the shared home

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are expected/guaranteed to be under the home, but not expected to be under the shared home

I have never seen an HPC cluster with a user home that is not a filesystem mapped across nodes, and thus shared. It's usually NFS. It's strategically like that so you can login to multiple different clusters an see files, and jobs running across compute nodes can see the same space too.


```bash
make multi-node
```

Here are instructions for control plane and worker nodes.

```bash
# Bootstrap a cluster
make up
Expand Down
4 changes: 2 additions & 2 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ services:
privileged: true
restart: always
networks:
default:
default_network:
ipv4_address: ${NODE_IP}
ports:
# <host>:<container>
Expand Down Expand Up @@ -46,7 +46,7 @@ services:
"nerdctl/bypass4netns-ignore-bind": "true"
"nerdctl/bypass4netns-ignore-subnets": "${BYPASS4NETNS_IGNORE_SUBNETS:-}"
networks:
default:
default_network:
ipam:
config:
# Each of the nodes has to have a different IP.
Expand Down
Loading