NVIDIA Bare Metal Manager (BMM) is an API-based microservice that provides site-local, zero-trust bare-metal lifecycle management with DPU-enforced isolation, allowing for deployment of multi-tenant AI infrastructure at scale. BMM enables zero-touch automation and ensures the integrity and separation of workloads at the bare-metal layer.
BMM has been designed according to the following principles:
- The machine is untrustworthy.
- Operating system requirements are not imposed on the machine.
- After being racked, machines must become ready for use with no human intervention.
- All monitoring of the machine must be done using out-of-band methods.
- The network fabric (i.e. Leaf Switches and routers) stays static even during tenancy changes within BMM.
BMM is responsible for the following tasks in the data-center environment:
- Maintain hardware inventory of ingested machines.
- Integrate with RedFish APIs to manage usernames and passwords
- Perform hardware testing and burn-in.
- Validate and update firmware.
- Allocate IP addresses (IPv4).
- Control power (power on/off/reset).
- Provide DNS services for managed machines.
- Orchestrate provisioning, wiping, and releasing nodes.
- Ensure trust of the machine when switching tenants.
BMM is not responsible for the following tasks:
- Configuration of services and software running on managed machines.
- Cluster assembly (that is, it does not build SLURM or Kubernetes clusters)
- Underlay network management
BMM is a service with multiple components that drive actions based on API calls, which can originate from users or as events triggered by machines (e.g. a DHCP boot or PXE request).
Each service communicates with the BMM API server over gRPC using protocol buffers. The API uses gRPC reflection to provide a machine readable API description so clients can auto-generate code and RPC functions in the client.
The BMM deployment includes a number of services:
- BMM API service: Allows users to query the state of all objects and to request creation, configuration, and deletion of entities.
- DHCP: Provides IPs to all devices on underlay networks, including Host BMCs, DPU BMCs, and DPU OOB addresses. It also provides IPs to Hosts on the overlay network.
- PXE: Delivers images to managed hosts at boot time. Currently, managed hosts are configured to always boot from PXE. If a local bootable device is found, the host will boot it. Hosts can also be configured to always boot from a particular image for stateless configurations.
- Hardware health: Pulls
hardware health and configuration information emitted from a Prometheus
/metricsendpoint on port 9009 and reports that state information back to BMM. - SSH console: Provides a virtual serial
console logging and access over
ssh, allowing console access to remote machines deployed on site. Thessh-consolealso logs the serial console output of each host into the logging system, where it can be queried using tools such as Grafana andlogcli. - DNS: Provides domain name service (DNS) functionality
using two services:
carbide-dns: Handles DNS queries from the site controller and managed nodes.unbound: Provides recursive DNS services to managed machines and instances.
In addition to the BMM service components, there are other supporting services that must be set up within the K8s site controller nodes.
- The entry point for the managed site is through the Elektra site agent. The site agent maintains a northbound Temporal connection to the cloud control plane for command and control.
- The admin CLI provides a command line interface into BMM.
Some site controller node services require persistent, durable storage to maintain state for their attendant pods:
- Hashicorp Vault: Used by Kubernetes for certificate signing requests (CSRs), this vault
uses three each (one per K8s control node) of the
data-vaultandaudit-vault10GB PVs to protect and distribute the data in the absence of a shared storage solution. - Postgres: This database is used to store state for any BMM or site controller
components that require it, including the main "forgedb". There are three 10GB
pgdataPVs deployed to protect and distribute the data in the absence of a shared storage solution. Theforgedbdatabase is stored here. - Certificate Management Infrastructure: This is a set of components that manage the certificates for the site controller and managed hosts.
The point of having a site controller is to administer a site that has been populated with managed hosts.
Each managed host is a pairing of a single Bluefield (BF) 2/3 DPU and a host server.
During initial deployment, the scout service runs, informing the BMM API of any discovered DPUs. BMM completes the installation of services on the DPU and boots into regular operation mode. Thereafter, the dpu-agent starts as a daemon.
Each DPU runs the dpu-agent which connects via gRPC to the API service in BMM to get configuration
instructions.
BMM collects metrics and logs from the managed hosts and the site controller. This information is in Prometheus format and can be scraped by a Prometheus server.