Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
382 changes: 382 additions & 0 deletions docs/boot-process.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,382 @@
# Bottlerocket Boot Process

This document describes how Bottlerocket boots, focusing on the systemd target progression and service dependencies.

**Keywords:** boot, systemd, targets, preconfigured, configured, multi-user, fipscheck, drivers, sysinit, services, dependencies, ordering, API system, bootstrap containers, settings, startup, initialization, kernel modules

## Overview

Bottlerocket's boot sequence progresses through six main stages, each represented by a systemd target:

```
sysinit.target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably add local-fs.target as one of the most important prerequisites to sysinit.target.

Quite a lot of Bottlerocket-specific work happens for local storage setup 😀 while the sysinit phase is rather vanilla.

fipscheck.target (FIPS mode only)
Comment on lines +12 to +14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the correct order, but the reason why is subtle. There's a bootconfig-fips.conf snippet in release that sets the default systemd target to fipscheck.target.

Since all the units that are needed by that target are DefaultDependencies=no, sysinit.target doesn't end up added to the job queue until we reach activate-preconfigured.target.

drivers.target (kernel module loading)
preconfigured.target (API system initialization)
configured.target (bootstrap containers)
multi-user.target (workload services)
```

Each stage must complete before the next begins. Services use systemd dependencies (`After=`, `Requires=`, `Wants=`) to coordinate within and between stages.

## Boot Stages

### Stage 0: sysinit.target

Standard systemd initialization. Most services implicitly depend on this through `DefaultDependencies=yes` (the default).

Early services that need to run before normal dependency chains use `DefaultDependencies=no`:

- Filesystem preparation (`prepare-var.service`, `prepare-boot.service`, etc.)
- Data store migration (`migrator.service`)

### Stage 1: fipscheck.target

**Purpose:** Verify cryptographic module integrity when FIPS mode is enabled.

**When it runs:** Only when the kernel command line includes `fips=1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not correct because of the bootconfig override, which is also where fips=1 comes from.


**Key services:**

- `check-kernel-integrity.service` - Verifies kernel integrity
- `check-fips-modules.service` - Loads and tests the `tcrypt` module
- Creates `/etc/.fips-module-check-passed` sentinel file on success
- Blocks boot if FIPS checks fail

**Transition:** Completes before `drivers.target` begins.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of coordinated state transitions, we have these:

  1. fipscheck -> preconfigured (optional)
  2. preconfigured -> configured
  3. configured -> multi-user

These are our "runlevels" or discrete stages. Targets don't work like runlevels, they just activate in response to something else causing them to be enqueued. sysinit.target is enqueued because services in preconfigured.target depend on it (because of default dependencies).


### Stage 2: drivers.target

**Purpose:** Load kernel modules and hardware drivers.

**When it runs:** Always runs, after `basic.target` and before `preconfigured.target`.
Comment on lines +53 to +57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to write this in terms of what "pulls" on various targets. Many of them are pulled in parallel (sysinit.target, drivers.target, network-online.target) by the units in preconfigured.


**Key services:**

- `load-neuron-inf1-modules.service` - Loads AWS Neuron Inf1 kernel modules
- `load-neuron-latest-modules.service` - Loads AWS Neuron Latest kernel modules

**Dependencies:**

- Runs after `basic.target`
- Runs before `preconfigured.target`
- Required by `preconfigured.target` and `multi-user.target`

**Note:** Some driver loading services (e.g., NVIDIA GPU drivers) are required by `preconfigured.target` directly rather than using `drivers.target`.

**Transition:** Completes before `preconfigured.target` begins.

### Stage 3: preconfigured.target

**Purpose:** Initialize the API system and apply all boot-time configuration.

**What "preconfigured" means:** The system has:

- A populated data store with default and user-provided settings
- A running API server
- All configuration files generated from settings

This is the most complex boot stage. Services run in a specific order to build up the system configuration:

#### 3.1 Data Store Setup

**migrator** (`migrator.service`)

- **When:** Runs with `DefaultDependencies=no`, before everything else
- **What:** Updates data store schema if the OS version changed
- **Dependencies:** Required by `apiserver.service`, `storewolf.service`, and `preconfigured.target`

**storewolf** (`storewolf.service`)

- **When:** After `migrator.service`
- **What:** Creates data store directories and populates default settings
- **Details:**
- Reads defaults from variant-specific `defaults.d` directories
- Writes settings to _pending_ state in "bottlerocket-launch" transaction
- Settings not available to other services until committed
- **Dependencies:** Required by `preconfigured.target`

#### 3.2 API Server

**apiserver** (`apiserver.service`)

- **When:** After `storewolf.service`
- **What:** Starts the API server on Unix socket `/run/api.sock`
- **Details:** Allows reading/writing settings via API
- **Dependencies:** Wanted by `preconfigured.target`

#### 3.3 Settings Population

**early-boot-config** (`early-boot-config.service`)

- **When:** After `network-online.target`, `apiserver.service`, `storewolf.service`
- **What:** Applies user data settings (cloud-init equivalent)
- **Details:**
- Only runs on first boot (checks for `/var/lib/bottlerocket/early-boot-config.ran`)
- Fetches user data from platform metadata service (e.g., EC2 IMDS)
- PATCHes settings to API in _pending_ state (not committed)
- **Dependencies:** Required by `preconfigured.target`

**sundog** (`sundog.service`)

- **When:** After `network-online.target`, `apiserver.service`, `early-boot-config.service`
- **What:** Generates dynamic settings that can't be determined until runtime
- **Details:**
- Examples: primary IP address, cluster DNS settings
- Runs `settings-committer` first to access user data settings
- PATCHes generated settings to API in _pending_ state
- **Dependencies:** Required by `preconfigured.target`
- **Subcomponent:** `pluto.service` generates Kubernetes-specific settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't really call pluto a subcomponent any more, now that it's not invoked as a generator. It's its own thing, an additional bonus action that generates more k8s settings.


#### 3.4 Configuration Application

**settings-applier** (`settings-applier.service`)

- **When:** After `storewolf.service`, `sundog.service`, `early-boot-config.service`, `apiserver.service`
- **What:** Writes all configuration files based on settings
- **Details:**
- Runs `settings-committer` to commit the "bottlerocket-launch" transaction
- Runs `thar-be-settings --all` to generate all config files
- This is when pending settings become live
- **Dependencies:** Required by `preconfigured.target`

#### 3.5 Stage Transition

**activate-configured** (`activate-configured.service`)

- **When:** After `preconfigured.target` completes
- **What:** Transitions to `configured.target`
- **Details:**
- Sets systemd default target to `configured.target`
- Starts `configured.target` asynchronously
- **Dependencies:** Wanted by `preconfigured.target`

### Stage 4: configured.target

**Purpose:** Run bootstrap containers that perform additional system configuration.

**What "configured" means:** The system has:

- Completed all API-based configuration
- Run any user-defined bootstrap containers
- Applied any additional configuration from bootstrap containers

**Key services:**

**bootstrap-containers@** (`[email protected]`)

- **When:** After `host-containerd.service`, before `configured.target`
- **What:** Runs bootstrap containers defined in settings
- **Details:**
- Template unit instantiated for each configured bootstrap container
- Only runs once per container (checks for `/run/bootstrap-containers/%i.ran`)
- Containers have access to host filesystem at `/.bottlerocket/rootfs`
- Boot blocks until all bootstrap containers complete
- Useful for: installing software, modifying files, running setup scripts
- **Dependencies:** Runs before `configured.target`

**activate-multi-user** (`activate-multi-user.service`)

- **When:** After `configured.target` and `reboot-if-required.service`
- **What:** Transitions to `multi-user.target`
- **Details:**
- Sets systemd default target to `multi-user.target`
- Starts `multi-user.target` asynchronously
- **Dependencies:** Wanted by `configured.target`

### Stage 5: multi-user.target

**Purpose:** Start workload services (kubelet, ECS agent, etc.).

**What "multi-user" means:** The system is fully configured and ready to run workloads.

**Key services:**

- `kubelet.service` (Kubernetes variants)
- `ecs.service` (ECS variants)
- `[email protected]` (admin container)
- `[email protected]` (control container)

**Dependencies:**

- Requires `basic.target` and `configured.target`
- This ensures all configuration is complete before workloads start
Comment on lines +205 to +208
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, but we never actually enqueue multi-user.target until these dependencies are satisfied.


## Service Dependency Patterns

### Ordering Dependencies

- `After=` - This service starts after the specified units
- `Before=` - This service starts before the specified units

### Requirement Dependencies

- `Requires=` - This service requires the specified units (hard dependency)
- `Wants=` - This service wants the specified units (soft dependency)
- `RequiredBy=` - Reverse of `Requires=` (specified in `[Install]` section)
- `WantedBy=` - Reverse of `Wants=` (specified in `[Install]` section)
Comment on lines +210 to +222
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the hard / soft dependency language insufficiently precise.

The "systemd as job queue" formulation would say:

  1. Wants/WantedBy - causes the wanted unit to be enqueued (pretend it happens in random order), only if and exactly when the wanting unit is enqueued
  2. After/Before - affects the order in which units are enqueued
  3. Requires/RequiredBy - like wants, but the requiring unit won't be started if the required unit fails


### Early Boot Services

Services that need to run very early use `DefaultDependencies=no` to avoid the standard dependency chain:

- `migrator.service`
- `prepare-*.service` (filesystem preparation)
- `activate-preconfigured.service`
Comment on lines +224 to +230
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is true, I don't think it's helpful - most of these services are special and have unique reasons for DefaultDependencies=no, either for performance or because they're essential to reach sysinit.target.


## Synchronization Mechanisms

### Systemd Targets

Targets serve as synchronization points. A target is "reached" when all services required by or wanted by that target have completed.

Target relationships:

```
drivers.target:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drivers.target is kind of a category error here - we can still be running some units from it while starting other units that preconfigured.target wants. The others are much stronger synchronization points.

In general I don't really see targets as a synchronization point. They are more of an abstraction over a bunch of units - "I need everything to bring the network online or to make the TPM2 device to also go into the queue when you put my job in."

They let you synchronize what you enqueue but not when, exactly - "when" is just "the same instant you enqueue some other job".

Requires: basic.target
RequiredBy: preconfigured.target, multi-user.target

preconfigured.target:
Requires: basic.target
RequiredBy: configured.target, multi-user.target

configured.target:
Requires: preconfigured.target
RequiredBy: multi-user.target

multi-user.target:
Requires: basic.target, configured.target
```

### Sentinel Files

Services use sentinel files to track state across reboots:

- `/var/lib/bottlerocket/early-boot-config.ran` - Prevents `early-boot-config.service` from running after first boot
- `/run/bootstrap-containers/<name>.ran` - Prevents bootstrap containers from re-running
- `/etc/.fips-module-check-passed` - Marks FIPS check completion

Services use `ConditionPathExists=` or `ConditionPathExists=!` to check for these files.
Comment on lines +257 to +265
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, though I dislike this pattern and it's more of a last resort ideally.

If we had a sentineldog that could create either a persistent (/.bottlerocket) or ephemeral (/etc) marker then that might be a nicer interface for programs that find themselves doing this.


### Transaction Commits

The API system uses transactions to ensure atomic updates:

1. Services write settings to _pending_ state during boot
2. Settings are grouped in the "bottlerocket-launch" transaction
3. `settings-committer` commits the transaction, making settings live
4. `settings-applier` then generates configuration files from live settings

This ensures all boot-time settings are applied together, preventing partial configuration.

## Boot Flow Diagram

```
┌─────────────────────────────────────────────────────────────────┐
│ sysinit.target │
│ - Standard systemd initialization │
│ - prepare-var.service, prepare-boot.service (early filesystem) │
└────────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ fipscheck.target (FIPS mode only) │
│ - check-kernel-integrity.service │
│ - check-fips-modules.service │
└────────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ drivers.target │
│ - load-neuron-inf1-modules.service (AWS Neuron Inf1) │
│ - load-neuron-latest-modules.service (AWS Neuron Latest) │
└────────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ preconfigured.target │
│ │
│ 1. migrator.service (data store migration) │
│ 2. storewolf.service (data store creation) │
│ 3. apiserver.service (API server) │
│ 4. early-boot-config.service (user data, first boot only) │
│ 5. sundog.service (dynamic settings) │
│ 6. settings-applier.service (commit & apply settings) │
│ │
│ Result: API system running, all settings applied │
└────────────────────────────┬────────────────────────────────────┘
activate-configured.service
┌─────────────────────────────────────────────────────────────────┐
│ configured.target │
│ │
│ - bootstrap-containers@*.service (user-defined setup) │
│ │
│ Result: Additional configuration complete │
└────────────────────────────┬────────────────────────────────────┘
activate-multi-user.service
┌─────────────────────────────────────────────────────────────────┐
│ multi-user.target │
│ │
│ - kubelet.service / ecs.service (workload orchestrator) │
│ - [email protected] (admin container) │
│ - [email protected] (control container) │
│ │
│ Result: System ready for workloads │
└─────────────────────────────────────────────────────────────────┘
```

## Debugging Boot Issues

### Check Target Status

```bash
# Check if a target has been reached
systemctl is-active drivers.target
systemctl is-active preconfigured.target
systemctl is-active configured.target
systemctl is-active multi-user.target

# See what's blocking a target
systemctl list-dependencies drivers.target
systemctl list-dependencies preconfigured.target
systemctl list-dependencies --reverse preconfigured.target
```

### Check Service Status

```bash
# See all failed services
systemctl --failed

# Check specific service
systemctl status migrator.service
systemctl status apiserver.service

# View service logs
journalctl -u migrator.service
journalctl -u apiserver.service
```

### Boot Timeline

To see the boot timeline:

```bash
systemd-analyze
systemd-analyze blame
systemd-analyze critical-chain
```

## Related Documentation

- [API System](../sources/api/README.md) - Detailed API component documentation
- [Bootstrap Containers](../sources/api/bootstrap-containers/README.md) - Bootstrap container usage
- [Early Boot Config](../sources/early-boot-config/README.md) - User data configuration
- [Settings System](../sources/api/thar-be-settings/README.md) - Configuration file generation