Skip to content

feat: Authoritative CoreDNS for Slurm/MPI hostname resolution#4353

Open
sujit-jadhav wants to merge 1 commit into
feature/multi-subnet-coresmd-supportfrom
feature/coredns-for-slurm-mpi
Open

feat: Authoritative CoreDNS for Slurm/MPI hostname resolution#4353
sujit-jadhav wants to merge 1 commit into
feature/multi-subnet-coresmd-supportfrom
feature/coredns-for-slurm-mpi

Conversation

@sujit-jadhav
Copy link
Copy Markdown
Collaborator

Summary

Implements CoreDNS as the authoritative DNS server for cluster-internal hostname resolution, replacing /etc/hosts-based management for Slurm and MPI workloads.

Changes

New Files (11 new, 19 modified — 30 files, +1038 / -17 lines)

Input & Validation

  • input/dns_config.yml — user configuration (dns_enabled, dns_domain, TTLs, SOA, fabric suffixes)
  • common/.../schema/dns_config.json — JSON Schema
  • common/.../en_us_validation_msg.py — 8 DNS error message constants
  • common/.../provision_validation.pyvalidate_dns_config() function
  • common/.../config.py — register dns_config in validation pipeline
  • common/.../tests/test_dns_config_validation.py — 33 unit tests (all pass)

CoreDNS Deployment (OIM)

  • prepare_oim/.../templates/Corefile.j2 — file, cache, reload, forward plugins
  • prepare_oim/.../templates/coredns.container.j2 — systemd quadlet
  • prepare_oim/.../tasks/deploy_coredns.yml — pull, configure, start

DNS Zone Pipeline

  • provision/.../templates/dns/forward_zone.j2 — A records from ip_name_map
  • provision/.../templates/dns/reverse_zone.j2 — PTR records
  • provision/.../tasks/generate_dns_zones.yml — zone rendering from SMD inventory
  • provision/.../tasks/generate_reverse_zone_additional.yml — per-additional-subnet reverse zones
  • provision/.../tasks/update_dns_zones.yml — lifecycle hook (node add/remove)

Cloud-init Templates (7 files)

  • Conditional: resolv.conf → OIM CoreDNS when dns_enabled, otherwise legacy /etc/hosts

Slurm /etc/hosts

  • update_hosts_munge.yml / update_hosts.yml — skip when dns_enabled

K8s Integration

  • Forward dns_domain to OIM CoreDNS from K8s CoreDNS ConfigMap

PR #4352 Compatibility

  • Reverse zones for admin + additional subnets
  • All variable names compatible with multi-subnet DHCP PR

Backward Compatible

dns_enabled defaults to false — zero behavioral change for existing deployments.

Tests

  • 33 new DNS validation tests — all pass
  • 17 existing multi-subnet tests — all still pass
  • All YAML/JSON files syntax-validated

Implement CoreDNS as the authoritative DNS server for cluster-internal
hostname resolution, replacing /etc/hosts-based management.

New input configuration:
- input/dns_config.yml: dns_enabled, dns_domain, dns_ttl, dns_cache_ttl,
  dns_fabric_suffixes, dns_soa, dns_reverse_enabled

Validation:
- JSON schema (dns_config.json) and validation logic (validate_dns_config)
- RFC 1035 domain validation, TTL range checks, SOA positive-int checks,
  fabric suffix format validation, reserved domain detection
- 33 unit tests covering all validation paths

CoreDNS deployment (OIM):
- Corefile.j2 template: file plugin for forward/reverse zones, cache,
  reload (10s), forward to upstream DNS
- Systemd quadlet (coredns.container.j2) for podman-managed container
- deploy_coredns.yml task: image pull, config generation, service start

DNS zone rendering pipeline:
- forward_zone.j2: SOA + NS + A records from ip_name_map
- reverse_zone.j2: SOA + NS + PTR records
- generate_dns_zones.yml: reads SMD inventory, renders zones
- generate_reverse_zone_additional.yml: per-additional-subnet reverse zones
- update_dns_zones.yml: lifecycle hook for node add/remove

Cloud-init templates (7 files):
- Conditional: resolv.conf pointing to OIM CoreDNS when dns_enabled,
  otherwise legacy /etc/hosts append

Slurm /etc/hosts management:
- update_hosts_munge.yml: skip /etc/hosts edits when dns_enabled
- update_hosts.yml: skip bulk /etc/hosts updates when dns_enabled

K8s CoreDNS integration:
- Forward dns_domain queries to OIM CoreDNS from K8s CoreDNS ConfigMap

Multi-subnet DHCP compatibility (PR #4352):
- Reverse zones generated for admin + additional subnets
- All variable names compatible with multi-subnet PR

Backward compatible: dns_enabled defaults to false, preserving existing
/etc/hosts behavior for users who do not opt in.
@abhishek-sa1 abhishek-sa1 deleted the branch feature/multi-subnet-coresmd-support May 13, 2026 14:07
@abhishek-sa1 abhishek-sa1 reopened this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants