-
Notifications
You must be signed in to change notification settings - Fork 21
An Overview of the Hanlon Microkernel
The Hanlon Microkernel itself is a small, in-memory Linux kernel that is used to boot 'nodes' (in this case, servers) that are discovered by the Hanlon server. It provides the Hanlon Server with an essential point of control that the Hanlon Server needs in order to discover and manage these nodes in the network. The Hanlon Microkernel also performs an essential task for the Hanlon Server; discovering the capabilities of these nodes and reporting those capabilities back to the Hanlon server (so that the Hanlon server can determine what it should do with them). Without the Hanlon Microkernel (or something similar to it), the Hanlon Server cannot do its job (classifying nodes and, based on their capabilities, applying models to them).
So how exactly does the Hanlon Microkernel work? What does it do and how does it interact with the Hanlon Server in order to accomplish those tasks? The following diagram shows how the Hanlon Microkernel and Hanlon Server relate to each other (and the interactions between them):

As you can see in this diagram, the Hanlon Server (which is represented by the green-colored components in the center of this diagram) only interacts with the Hanlon Microkernel instances that it is managing (the "MK" instances deployed to the nodes on the left-hand side of this diagram) in response to requests that it receives from those instances (the HTTP 'checkin' and 'register' requests that are sent to the Hanlon Server by the Hanlon Microkernel Controller instances). The responses to these checkin requests can include meta-data and commands that the Hanlon server would like to pass back to these Microkernel instances (more on this below). The other two components shown in this diagram (the Hanlon Database instance and the Broker instance) are key components that the Hanlon server interacts with over time, but we won’t go into any specifics here as to how those interactions occur other than to point out that for both of these components the Hanlon Server uses a plugin model (allowing for wholesale replacement of these two components with another component that implements the other side of this plugin.
So, now that we’ve shown how the Hanlon Microkernel and Hanlon Server relate to each other, what exactly is the role that the Hanlon Microkernel plays in this process? As some of you may already know, the primary responsibility of the Hanlon Server is to use the properties (or “facts”) about the hardware (or nodes) that are “discovered” in order to determine what should be done with those nodes. The properties that are reported to Hanlon during the node registration process (more on this, below) are used to “tag” the nodes being managed by Hanlon, and those tags can then be used (based on a policy match) to map a model to each of the nodes (which can trigger the process of provisioning an OS to one or more nodes, for example). In this picture, the primary responsibility of the Hanlon Microkernel is to provide the Hanlon Server with the facts for the nodes onto which the Microkernel is “deployed”. The Hanlon Microkernel gathers these facts using a combination of tools (primarily the facter
tool, from Puppet Labs, along with the lshw
, dmidecode
, lscpu
, and ipmitool
commands), and these facts are reported back to the Hanlon Server by the Microkernel as part of the node registration process (more on this, below). Without the Microkernel, the Hanlon Server has no way to determine what the capabilities of the nodes are and, using those capabilities, determine what sort of model it should be applying to any given node.
The Hanlon Microkernel also has a secondary responsibility in this picture. That secondary responsibility is to provide a default boot state for any node that is discovered by the Hanlon Server. When a new node is discovered (any node for which the Hanlon Server cannot find an matching policy), the Hanlon Server applies a default policy to that node which results in that newly discovered node being booted into the Hanlon Microkernel. This will typically trigger the process of node checkin and registration, but in the future we might use this same pattern to trigger additional actions using customized Microkernel instances (a Microkernel that performs a system audit or a “boot-nuke”, for example). The existence of the Microkernel instance (and the fact that the Hanlon Server is selecting the Microkernel instance based on policies defined within the Hanlon Server) means the the possibilities here are almost endless.
Given that the Hanlon Microkernel is the default boot state for any new node encountered by the Hanlon Server, perhaps it would be worthwhile to describe the Microkernel boot process itself. This process begins with the delivery of the Hanlon Microkernel (as a compressed kernel image and a ram-disk image) by the Hanlon Server’s “Image Service”. These images are taken directly from a RancherOS ISO that was added to Hanlon using a hanlon image add -t mk ...
command. During the boot process, Hanlon also provides a URL that the RancherOS ISO should use to obtain it's cloud-config
. This cloud-config
is used, among other things, to provide configuration information to the Microkernel instance and to define a few commands that should be run when the RancherOS instance boots. In the current implementation, these commands setup a listener on a FIFO that is used by the Microkernel container to communicate with the underlying RancherOS host it is running in (to execute host-based commands like reboot
or poweroff
when such commands are triggered by commands it receives back from the Hanlon server) and then downloads the Microkernel (Docker) image from Hanlon, adds that image to the local Docker registry, and starts a Microkernel container using a docker run
command.
The docker run
command that is executed in the RancherOS instance triggers a number of additional actions within the Microkernel container. These actions result in the following services being started when the container is initialized:
- The Microkernel Controller – a Ruby-based daemon process that interacts with the Hanlon Server via HTTP
- The Microkernel Web Server – a WEBrick instance that can be used to interact with the Microkernel Controller via HTTP; currently this server is only used by the Microkernel Controller itself to save any configuration changes it might receive from the Hanlon Server (this action actually triggers a restart of the Microkernel Controller by this web server instance).
Once the node has been successfully booted using the Microkernel, the the Microkernel Controller’s first action is to checkin with the Hanlon Server (this “checkin action” is repeated periodically, and the timeing of these checkins is set in the configuration that the Hanlon Server passes back to the Microkernel in the checkin response, more on this below). In the Hanlon Server’s response to these checkin requests, the server includes two additional components. The first component included by the server's checkin response is a command that tells the Microkernel what it should do next. Currently, this set is limited to one of the following commands:
- acknowledge – A command from the Hanlon Server indicating that the checkin request has been received and that there is no action necessary on the part of the Microkernel at this time
- register – A command from the Hanlon server asking the Microkernel to report back the “facts” that it can discover about the underlying hardware that it has been deployed onto
- reboot – A command from the Hanlon Server asking the Microkernel instance to reboot itself. This is typically the result of the Hanlon Server finding an applicable policy for that node after the Microkernel has registered the node with the Hanlon Server (or after a new policy has been defined/enabled), but this command might be sent back under other circumstances in the future.
- poweroff – A command from the Hanlon Server asking the Microkernel instance to poweroff immediately. This is typically the result of the Hanlon Server finding an matching "discover-only" policy for that node after the Microkernel has registered the node with the Hanlon Server (or after a new "discover-only" policy has been defined/enabled), but this command might be sent back under other circumstances in the future.
The second component sent back to the Microkernel Controller in the server's checkin response is the configuration that the Hanlon Server would like that Microkernel instance to use (which includes parameters like the periodicity that the Microkernel should be using for its checkin requests, a pattern indicating which “facts” should NOT be reported, the log-level the Microkernel Controller and it's associated services should use internally, and even the URL of the Hanlon Server itself). If this configuration has changed in any way since the last checkin by the Microkernel, the Microkernel Controller will save this new configuration and the Microkernel Controller itself will be restarted (forcing it to pick up the new configuration). This ability to set the Microkernel Controller’s configuration using the checkin response gives the Hanlon Server complete control over the behavior of the Microkernel instances that it is interacting with.
As was mentioned previously, once the node has checked in the Hanlon Server may send back a command triggering the Node Registration process. This process can be triggered by either of the following situations:
- If the Hanlon server has not seen a node before, or if the Hanlon server has not seen that node in a while (the timing for this is configurable), then it will send back a register command in the server's checkin response.
- Whenever the Microkernel Controller detects that the facts that it gathers about the underlying node that it has been deployed to are different than they were during its last successful checkin with the Hanlon server, the Microkernel Controller will register the new facts it has gathered with the Hanlon Server (without being prompted to do so).
The following sequence diagram can be used to visualize the sequence of actions outlined above (showing both the node checkin and node registration events):

So, at a high-level we have described how the Hanlon Microkernel interacts with the Hanlon Server to checkin and register nodes with the server. How exactly is the data that is reported in the registration process gathered (and how is that data reported)?
The meta-data that is reported by the Hanlon Microkernel to the Hanlon Server during the node registration process is gathered using several tools that are built into the Hanlon Microkernel:
- Facter (a cross-platform library available from Puppet Labs that they designed to gather information about nodes being managed by their DevOps framework, Puppet) is used to gather most of the facts that are included in the node registration request. This has the advantage gathering a lot of data about the nodes without much effort on our part.
- The lshw command is used to gather additional information about the underlying hardware (processor, bus, firmware, memory, disks, and network). This additional information is used to supplement the information that we can gather using Facter, and provides us with a lot of detail that is not available in the Facter view of the nodes.
- the lscpu command is used to gather additional information about the CPU itself that are not available from either Facter or through the lshw command (virtualization support, cache sizes, byte order, etc.) and, as was the case with the lshw output, this additional information is used to supplement the facts that can be gathered from other sources
- the ipmitool command is used to discover information associated with the Baseboard Management Controller (or BMC) attached to the node (if one exists)
The following diagram shows the Hanlon Microkernel components that are involved in gathering these facts and reporting them to the Hanlon Server:

As is shown in this diagram, the Microkernel's Registration Manager first uses a 'Fact Manager' class to gather Facter Facts (using Facter). Those Facter Facts are then supplemented with a set of Hardware Facts (which are gathered using a 'Hardware Facter' class that manages the process of gathering detailed, 'hardware-specific' facts about the underlying platform using the lshw, lscpu, and ipmitool commands, as was outlined above). Finally, we are planning on adding the ability to discover information about the network topology around the Microkernel using Link-Layer Datagram Protocol (or LLDP) in the not-to-distant future. This document will be updated to reflect those changes when they are made.
The combined set of meta-data gathered from these sources are then passed through a filter, which uses the 'mk_fact_excl_pattern' property from the Microkernel configuration to filter out any facts with a name that matches the regular expression pattern contained in that property. This lets us restrict the facts that are reported to the Hanlon Server by the Microkernel to only those facts that the Hanlon Server is interested in (all of the 'array-style' facts are suppressed in the current default configuration that is provided by the Hanlon Server, for example) and also lets us minimize the number of registration requests received by the Hanlon Server by eliminating fields that we aren't interested in but that change constantly in the Microkernel instances (the free memory in the Microkernel, for example). Once the fields in the 'facts map' have been winnowed down to just those facts that are 'of interest', the 'facts map' is converted to a JSON-style string, and that map is then sent to the Hanlon Server as part of the Node Registration request.