Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions k8s-training/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,24 @@ You can use Filestore to add external storage to K8s clusters, this allows you t

For more information on how to access storage in K8s, refer [here](#accessing-storage).

### Shared filesystem CSI automation

When a shared filesystem is present, either because this stack created it or because `existing_filestore` was provided, Terraform can also install the Nebius Shared Filesystem CSI driver and promote its StorageClass to the cluster default.

```hcl
enable_filestore = true
existing_filestore = "" # or an existing filesystem ID
filestore_mount_path = "/mnt/data"
filesystem_csi = {
chart_version = "0.1.5"
namespace = "kube-system"
make_default_storage_class = true
previous_default_storage_class_name = "compute-csi-default-sc"
}
```

This Terraform automation installs the CSI driver and configures the StorageClass only. Verification, pod-level validation, and cleanup remain in `filesystem-csi-validation/` as an explicit opt-in workflow.

## Connecting to the cluster

### Preparing the environment
Expand Down
1 change: 1 addition & 0 deletions k8s-training/filesystem-csi-validation/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.state/
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
#!/usr/bin/env bash
# -----------------------------------------------------------------------------
# File: 01-verify-node-filesystem-mounts.sh
# Purpose:
# Verify that the Nebius Shared Filesystem is mounted on every Kubernetes
# node at the expected host path before any pod-level storage testing begins.
#
# Why We Run This:
# The Nebius CSI workflow in this repo depends on the shared filesystem
# already being attached and mounted on each node. If a node is missing the
# host mount, later PVC or pod checks can fail in ways that are harder to
# diagnose.
#
# Reference Docs:
# https://docs.nebius.com/kubernetes/storage/filesystem-over-csi
#
# Repo Sources of Truth:
# - ../../modules/cloud-init/k8s-cloud-init.tftpl
# - ../main.tf
#
# What This Script Checks:
# - The mount exists at /mnt/data (or the value of MOUNT_POINT)
# - The mount is present in /etc/fstab
# - The mounted filesystem reports capacity via df
# - The target directory exists on the host
#
# Usage:
# ./01-verify-node-filesystem-mounts.sh
#
# Optional Environment Variables:
# TEST_NAMESPACE Namespace used for the temporary node-debugger pods.
# Defaults to the current kubectl namespace or default.
# MOUNT_POINT Host path to validate. Defaults to the Terraform mount.
# DEBUG_IMAGE Image used by kubectl debug. Defaults to ubuntu.
# VERIFY_ALL_NODES When true, validates every node in the cluster. Defaults
# to false.
# TARGET_NODE Specific node to validate. Accepts either
# node/<name> or <name>. Overrides VERIFY_ALL_NODES.
#
# Created By: Aaron Fagan
# Created On: 2026-03-17
# Version: 0.1.0
# -----------------------------------------------------------------------------
set -euo pipefail

SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/common.sh"

DEBUG_IMAGE="${DEBUG_IMAGE:-ubuntu}"
VERIFY_ALL_NODES="${VERIFY_ALL_NODES:-false}"
TARGET_NODE="${TARGET_NODE:-}"
FAILED=0

normalize_node_name() {
local node_name="$1"
if [[ "${node_name}" == node/* ]]; then
printf '%s\n' "${node_name}"
else
printf 'node/%s\n' "${node_name}"
fi
}

log_step "Starting Nebius Shared Filesystem mount verification"
log_info "Namespace for temporary debug pods: ${TEST_NAMESPACE}"
log_info "Expected mount point: ${MOUNT_POINT}"
log_info "Debug image: ${DEBUG_IMAGE}"

log_step "Checking required local dependencies"
require_command kubectl
require_command awk
require_command mktemp
log_pass "Required local commands for node mount verification are available"

log_step "Preparing local state for debugger pod cleanup"
ensure_state_dir
touch "${DEBUG_POD_RECORD_FILE}"
log_info "Debugger pod record file: ${DEBUG_POD_RECORD_FILE}"
log_info "New debugger pods from this run will be appended for later cleanup"

log_step "Selecting which nodes to validate"
ALL_NODES=()
while IFS= read -r node; do
[[ -n "${node}" ]] && ALL_NODES+=("${node}")
done < <(kubectl get nodes -o name)

if [[ "${#ALL_NODES[@]}" -eq 0 ]]; then
log_fail "No Kubernetes nodes were returned by kubectl"
exit 1
fi

if [[ -n "${TARGET_NODE}" ]]; then
TARGET_NODE="$(normalize_node_name "${TARGET_NODE}")"
NODES_TO_CHECK=("${TARGET_NODE}")
log_info "Using explicitly requested node: ${TARGET_NODE}"
elif [[ "${VERIFY_ALL_NODES}" == "true" ]]; then
NODES_TO_CHECK=("${ALL_NODES[@]}")
log_info "VERIFY_ALL_NODES=true, so every node will be checked"
else
NODES_TO_CHECK=("${ALL_NODES[0]}")
log_info "Defaulting to a single-node validation using: ${NODES_TO_CHECK[0]}"
fi

log_pass "Selected ${#NODES_TO_CHECK[@]} node(s) for shared filesystem mount validation"

log_step "Checking Nebius Shared Filesystem mounts on the selected Kubernetes nodes"
for node in "${NODES_TO_CHECK[@]}"; do
echo
echo "------------------------------------------------------------"
echo "=== ${node} ==="
output_file="$(mktemp)"
if ! kubectl debug -n "${TEST_NAMESPACE}" "${node}" \
--attach=true \
--quiet \
--image="${DEBUG_IMAGE}" \
--profile=sysadmin -- \
chroot /host sh -lc "
set -eu
echo '[check] Verifying that the Nebius Shared Filesystem is actively mounted at ${MOUNT_POINT}'
mount | awk '\$3 == \"${MOUNT_POINT}\" { print; found=1 } END { exit found ? 0 : 1 }'
echo '[check] Verifying that the mount is persisted in /etc/fstab for node reboot safety'
awk '\$2 == \"${MOUNT_POINT}\" { print; found=1 } END { exit found ? 0 : 1 }' /etc/fstab
echo '[check] Verifying that the mounted filesystem reports capacity and is readable'
df -h ${MOUNT_POINT}
echo '[check] Verifying that the target directory exists on the host'
test -d ${MOUNT_POINT}
echo '[result] PASS: shared filesystem host mount is active and healthy at ${MOUNT_POINT} on this node'
" 2>&1 | tee "${output_file}"; then
FAILED=1
echo "[result] FAIL: ${node} does not have a healthy shared filesystem mount at ${MOUNT_POINT}" >&2
fi

debug_pod_name="$(awk '/Creating debugging pod / { print $4 }' "${output_file}" | tail -n 1)"
if [[ -n "${debug_pod_name}" ]]; then
printf '%s %s\n' "${TEST_NAMESPACE}" "${debug_pod_name}" >> "${DEBUG_POD_RECORD_FILE}"
fi
rm -f "${output_file}"
done

if [[ "${FAILED}" -eq 0 ]]; then
log_step "Shared filesystem mount verification completed successfully"
log_info "All checked nodes reported a healthy mount at ${MOUNT_POINT}"
else
log_step "Shared filesystem mount verification completed with failures"
log_info "Review the node output above for the failing mount checks"
fi

exit "${FAILED}"
98 changes: 98 additions & 0 deletions k8s-training/filesystem-csi-validation/02-run-csi-smoke-test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
#!/usr/bin/env bash
# -----------------------------------------------------------------------------
# File: 02-run-csi-smoke-test.sh
# Purpose:
# Run a minimal end-to-end validation using one PVC and one pod that mounts
# the shared volume at /data.
#
# Why We Run This:
# This is the fastest proof that the Terraform-managed default StorageClass
# works, the PVC binds, and a pod can read and write data through the
# shared filesystem exposed through CSI.
#
# Reference Docs:
# https://docs.nebius.com/kubernetes/storage/filesystem-over-csi
#
# What This Script Does:
# - Applies the single-pod smoke test manifest
# - Waits for the PVC to bind
# - Verifies that the PVC inherited the expected default StorageClass
# - Waits for the pod to become ready
# - Writes and reads a small probe file inside /data
#
# Usage:
# ./02-run-csi-smoke-test.sh
#
# Optional Environment Variables:
# TEST_NAMESPACE Namespace where the validation resources should be created.
# Defaults to the current kubectl namespace or default.
#
# Manifest Used:
# manifests/01-csi-smoke-test.yaml
#
# Created By: Aaron Fagan
# Created On: 2026-03-17
# Version: 0.1.0
# -----------------------------------------------------------------------------
set -euo pipefail

SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/common.sh"

log_step "Starting single-pod shared filesystem smoke test"
log_info "Namespace: ${TEST_NAMESPACE}"
log_info "Manifest: ${FILESYSTEM_SMOKE_MANIFEST_PATH}"
log_info "PVC name: ${FILESYSTEM_SMOKE_PVC_NAME}"
log_info "Pod name: ${FILESYSTEM_SMOKE_POD_NAME}"
log_info "Expected default StorageClass: ${FILESYSTEM_DEFAULT_STORAGE_CLASS_NAME}"

log_step "Checking required local dependencies"
require_command kubectl
log_pass "Required local commands for the smoke test are available"

log_step "Applying the smoke test manifest"
kubectl apply -n "${TEST_NAMESPACE}" -f "${FILESYSTEM_SMOKE_MANIFEST_PATH}"
log_pass "Smoke test manifest applied in namespace '${TEST_NAMESPACE}'"

log_step "Waiting for the smoke test PVC to bind"
kubectl wait -n "${TEST_NAMESPACE}" \
--for=jsonpath='{.status.phase}'=Bound \
"pvc/${FILESYSTEM_SMOKE_PVC_NAME}" \
--timeout=120s
log_info "PVC '${FILESYSTEM_SMOKE_PVC_NAME}' is bound"
log_pass "Smoke test PVC '${FILESYSTEM_SMOKE_PVC_NAME}' bound successfully"

log_step "Verifying that the smoke test PVC inherited the default StorageClass"
SMOKE_STORAGE_CLASS_NAME="$(kubectl get pvc -n "${TEST_NAMESPACE}" "${FILESYSTEM_SMOKE_PVC_NAME}" -o jsonpath='{.spec.storageClassName}')"
if [[ -z "${SMOKE_STORAGE_CLASS_NAME}" ]]; then
log_fail "Smoke test PVC '${FILESYSTEM_SMOKE_PVC_NAME}' did not receive a StorageClass from the cluster default"
exit 1
fi
if [[ "${SMOKE_STORAGE_CLASS_NAME}" != "${FILESYSTEM_DEFAULT_STORAGE_CLASS_NAME}" ]]; then
log_fail "Smoke test PVC '${FILESYSTEM_SMOKE_PVC_NAME}' used StorageClass '${SMOKE_STORAGE_CLASS_NAME}', expected '${FILESYSTEM_DEFAULT_STORAGE_CLASS_NAME}'"
exit 1
fi
log_info "PVC '${FILESYSTEM_SMOKE_PVC_NAME}' was assigned StorageClass '${SMOKE_STORAGE_CLASS_NAME}'"
log_pass "Smoke test PVC '${FILESYSTEM_SMOKE_PVC_NAME}' inherited the expected default StorageClass"

log_step "Waiting for the smoke test pod to become ready"
kubectl wait -n "${TEST_NAMESPACE}" \
--for=condition=Ready \
"pod/${FILESYSTEM_SMOKE_POD_NAME}" \
--timeout=120s
log_info "Pod '${FILESYSTEM_SMOKE_POD_NAME}' is ready"
log_pass "Smoke test pod '${FILESYSTEM_SMOKE_POD_NAME}' reached Ready state"

log_step "Writing and reading a probe file through the mounted volume"
kubectl exec -n "${TEST_NAMESPACE}" "${FILESYSTEM_SMOKE_POD_NAME}" -- sh -lc '
set -eu
df -h /data
echo ok > /data/probe.txt
ls -l /data
cat /data/probe.txt
'
log_pass "Pod '${FILESYSTEM_SMOKE_POD_NAME}' successfully wrote and read the probe file on the shared volume"

log_step "Smoke test completed successfully"
log_info "The PVC inherited the cluster default StorageClass and the mounted shared filesystem accepted a write and returned the probe file"
log_pass "Single-pod shared filesystem smoke test confirmed default StorageClass behavior and working storage access"
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#!/usr/bin/env bash
# -----------------------------------------------------------------------------
# File: 03-run-csi-rwx-cross-node-test.sh
# Purpose:
# Validate ReadWriteMany behavior across nodes by mounting the same PVC into
# two pods scheduled onto different hosts.
#
# Why We Run This:
# A single-pod test proves basic functionality, but shared filesystems are
# most valuable when data written from one node can be read from another. This
# script confirms that cross-node sharing works in practice.
#
# Reference Docs:
# https://docs.nebius.com/kubernetes/storage/filesystem-over-csi
#
# What This Script Does:
# - Applies a RWX PVC plus reader/writer pod manifest
# - Uses pod anti-affinity to encourage placement on different nodes
# - Waits for the PVC and both pods to become ready
# - Verifies that the PVC inherited the expected default StorageClass
# - Writes a file from one pod and reads it from the other
#
# Usage:
# ./03-run-csi-rwx-cross-node-test.sh
#
# Optional Environment Variables:
# TEST_NAMESPACE Namespace where the validation resources should be created.
# Defaults to the current kubectl namespace or default.
#
# Manifest Used:
# manifests/02-csi-rwx-cross-node.yaml
#
# Created By: Aaron Fagan
# Created On: 2026-03-17
# Version: 0.1.0
# -----------------------------------------------------------------------------
set -euo pipefail

SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/common.sh"

log_step "Starting cross-node RWX validation"
log_info "Namespace: ${TEST_NAMESPACE}"
log_info "Manifest: ${FILESYSTEM_RWX_MANIFEST_PATH}"
log_info "PVC name: ${FILESYSTEM_RWX_PVC_NAME}"
log_info "Writer pod: ${FILESYSTEM_RWX_WRITER_POD_NAME}"
log_info "Reader pod: ${FILESYSTEM_RWX_READER_POD_NAME}"
log_info "Expected default StorageClass: ${FILESYSTEM_DEFAULT_STORAGE_CLASS_NAME}"

log_step "Checking required local dependencies"
require_command kubectl
log_pass "Required local commands for the RWX validation are available"

log_step "Applying the RWX validation manifest"
kubectl apply -n "${TEST_NAMESPACE}" -f "${FILESYSTEM_RWX_MANIFEST_PATH}"
log_pass "RWX validation manifest applied in namespace '${TEST_NAMESPACE}'"

log_step "Waiting for the RWX PVC to bind"
kubectl wait -n "${TEST_NAMESPACE}" \
--for=jsonpath='{.status.phase}'=Bound \
"pvc/${FILESYSTEM_RWX_PVC_NAME}" \
--timeout=120s
log_info "PVC '${FILESYSTEM_RWX_PVC_NAME}' is bound"
log_pass "RWX PVC '${FILESYSTEM_RWX_PVC_NAME}' bound successfully"

log_step "Verifying that the RWX PVC inherited the default StorageClass"
RWX_STORAGE_CLASS_NAME="$(kubectl get pvc -n "${TEST_NAMESPACE}" "${FILESYSTEM_RWX_PVC_NAME}" -o jsonpath='{.spec.storageClassName}')"
if [[ -z "${RWX_STORAGE_CLASS_NAME}" ]]; then
log_fail "RWX PVC '${FILESYSTEM_RWX_PVC_NAME}' did not receive a StorageClass from the cluster default"
exit 1
fi
if [[ "${RWX_STORAGE_CLASS_NAME}" != "${FILESYSTEM_DEFAULT_STORAGE_CLASS_NAME}" ]]; then
log_fail "RWX PVC '${FILESYSTEM_RWX_PVC_NAME}' used StorageClass '${RWX_STORAGE_CLASS_NAME}', expected '${FILESYSTEM_DEFAULT_STORAGE_CLASS_NAME}'"
exit 1
fi
log_info "PVC '${FILESYSTEM_RWX_PVC_NAME}' was assigned StorageClass '${RWX_STORAGE_CLASS_NAME}'"
log_pass "RWX PVC '${FILESYSTEM_RWX_PVC_NAME}' inherited the expected default StorageClass"

log_step "Waiting for both RWX test pods to become ready"
kubectl wait -n "${TEST_NAMESPACE}" \
--for=condition=Ready \
"pod/${FILESYSTEM_RWX_WRITER_POD_NAME}" \
--timeout=180s
kubectl wait -n "${TEST_NAMESPACE}" \
--for=condition=Ready \
"pod/${FILESYSTEM_RWX_READER_POD_NAME}" \
--timeout=180s
log_info "Both RWX test pods are ready"
log_pass "RWX writer and reader pods both reached Ready state"

log_step "Checking the node placement for the reader and writer pods"
WRITER_NODE="$(kubectl get pod -n "${TEST_NAMESPACE}" "${FILESYSTEM_RWX_WRITER_POD_NAME}" -o jsonpath='{.spec.nodeName}')"
READER_NODE="$(kubectl get pod -n "${TEST_NAMESPACE}" "${FILESYSTEM_RWX_READER_POD_NAME}" -o jsonpath='{.spec.nodeName}')"

echo "writer node: ${WRITER_NODE}"
echo "reader node: ${READER_NODE}"

kubectl get pods -n "${TEST_NAMESPACE}" "${FILESYSTEM_RWX_WRITER_POD_NAME}" "${FILESYSTEM_RWX_READER_POD_NAME}" -o wide
log_pass "RWX pod placement details collected for both nodes"

log_step "Writing shared data from the writer pod"
kubectl exec -n "${TEST_NAMESPACE}" "${FILESYSTEM_RWX_WRITER_POD_NAME}" -- sh -lc '
set -eu
echo "shared-check" > /data/shared.txt
cat /data/shared.txt
'
log_pass "Writer pod '${FILESYSTEM_RWX_WRITER_POD_NAME}' wrote shared data to the mounted volume"

log_step "Reading the same shared data from the reader pod"
kubectl exec -n "${TEST_NAMESPACE}" "${FILESYSTEM_RWX_READER_POD_NAME}" -- sh -lc '
set -eu
ls -l /data
cat /data/shared.txt
'
log_pass "Reader pod '${FILESYSTEM_RWX_READER_POD_NAME}' read the shared file created by the writer pod"

log_step "Cross-node RWX validation completed successfully"
log_info "The PVC inherited the cluster default StorageClass and the same file was visible from both pods through the shared volume"
log_pass "Cross-node ReadWriteMany storage behavior and default StorageClass inheritance confirmed"
Loading
Loading