diff --git a/README.md b/README.md index ee7de47..b29c42b 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ TPM attestation workflow ensures the integrity of networking devices throughout - [4. Switch owner issues LAK cert based on IAK cert signed by switch vendor CA](#4-switch-owner-issues-lak-cert-based-on-iak-cert-signed-by-switch-vendor-ca) - [TPM 2.0 Attestation for Switch Owners](#tpm-20-attestation-for-switch-owners) - [General Guidelines on What to Attest](#general-guidelines-on-what-to-attest) - - [Conceptual Flow for *Offline* PCR Precomputation](#conceptual-flow-for-offline-pcr-precomputation) + - [Conceptual Flow for _Offline_ PCR Precomputation](#conceptual-flow-for-offline-pcr-precomputation) - [TPM 2.0 Attestation Workflow Steps](#tpm-20-attestation-workflow-steps) - [TPM 2.0 Attestation Workflow Diagram](#tpm-20-attestation-workflow-diagram) - [TPM 2.0 Attestation Alternatives Considered](#tpm-20-attestation-alternatives-considered) @@ -35,19 +35,19 @@ TPM attestation workflow ensures the integrity of networking devices throughout - TPM EnrollZ service (or simply EnrollZ service) is the switch owner's internal infrastructure service responsible for the TPM 2.0 enrollment workflow. - TPM AttestZ service (or simply AttestZ service) is switch owner's internal infrastructure service responsible for TPM 2.0 attestation workflow. - Switch owner CA is the switch owner's internal Certificate Authority service. -- Switch chassis consists of one or more *“control cards”* (or *“control cards”*, *“routing engines”*, *“routing processors”*, etc.), each of which is equipped with its own CPU and TPM. The term control card will be used throughout the doc. +- Switch chassis consists of one or more _“control cards”_ (or _“control cards”_, _“routing engines”_, _“routing processors”_, etc.), each of which is equipped with its own CPU and TPM. The term control card will be used throughout the doc. -**Differences between various certs** *(more details in the [TCG spec](https://trustedcomputinggroup.org/wp-content/uploads/TPM-2p0-Keys-for-Device-Identity-and-Attestation_v1_r12_pub10082021.pdf))*: +**Differences between various certs** _(more details in the [TCG spec](https://trustedcomputinggroup.org/wp-content/uploads/TPM-2p0-Keys-for-Device-Identity-and-Attestation_v1_r12_pub10082021.pdf))_: -| **Cert type** | **On which pub key is the cert based on?** | **Can there be more than one underlying keypair for a given TPM?** | **Which CA issues/signs the cert?** | -| :---: | :----: | :---: | :---: | -| Initial Attestation Key (IAK) | IAK pub | No | Switch Vendor | -| Local Attestation Key (LAK) | LAK pub | Yes| Switch Owner | -| Owner IAK (oIAK) | IAK pub | No | Switch Owner | -| Initial Device Identity (IDevID) | IDevID pub | No | Switch Vendor | -| Local Device Identity (LDevID) | LDevID pub | Yes| Switch Owner | -| Owner Device Identity (oIDevID) | IDevID pub | No | Switch Owner | -| Endorsement Key (EK) | EK pub | No | TPM Vendor | +| **Cert type** | **On which pub key is the cert based on?** | **Can there be more than one underlying keypair for a given TPM?** | **Which CA issues/signs the cert?** | +| :------------------------------: | :----------------------------------------: | :----------------------------------------------------------------: | :---------------------------------: | +| Initial Attestation Key (IAK) | IAK pub | No | Switch Vendor | +| Local Attestation Key (LAK) | LAK pub | Yes | Switch Owner | +| Owner IAK (oIAK) | IAK pub | No | Switch Owner | +| Initial Device Identity (IDevID) | IDevID pub | No | Switch Vendor | +| Local Device Identity (LDevID) | LDevID pub | Yes | Switch Owner | +| Owner Device Identity (oIDevID) | IDevID pub | No | Switch Owner | +| Endorsement Key (EK) | EK pub | No | TPM Vendor | ## Design @@ -68,13 +68,13 @@ Even though it is strongly preferred to rely on ECC P521 and SHA-512 where possi #### TPM 2.0 Enrollment Workflow Steps 1. On completion of Bootz workflow, device obtains all necessary credentials and configurations to start serving TPM enrollment gRPC API endpoints on the same port as gNOI/gNSI/gNMI (9339). - - *Note: A device is shipped to the switch owner with a default SSL profile configured to rely on the IDevID key pair and IDevID TLS cert (signed by the switch vendor CA) for all RPCs.* + - _Note: A device is shipped to the switch owner with a default SSL profile configured to rely on the IDevID key pair and IDevID TLS cert (signed by the switch vendor CA) for all RPCs._ 2. On completion of Bootz, EnrollZ service is notified to enroll a TPM on a specific control card and calls the device's `GetIakCert` API to get back an IAK and IDevID certs. - During initial bootstrapping, an active control card must use its IDevID cert (part of switch's default SSL profile) for securing TLS connection. Once the device is provisioned with switch-owner-issued prod TLS cert in `certz` workflow, the device must always use that cert for all subsequent enrollz RPCs (such as `RotateOIakCert`). - Primary/active control card is also responsible for all RPCs directed to the secondary/standby control card. The mechanism of internal communication between the two control cards depends on the switch vendor and is out of scope of this doc. - Since the switch owner cannot directly TLS authenticate standby card, it is the responsibility of an active card to do an auth handshake with the standby card based on the IDevID key pair/cert as described in [RMA Scenario](#rma-scenario). + Since the switch owner cannot directly TLS authenticate standby card, it is the responsibility of an active card to do an auth handshake with the standby card based on the IDevID key pair/cert as described in [RMA Scenario](#rma-scenario). 3. EnrollZ service uses the trust bundle/anchor obtained (in advance) from the switch vendor to verify signature over the IAK cert, and ensure that the control card serial number in IAK cert and IDevID cert is the same. - - *Note: EnrollZ service must have access to the up-to-date switch vendor trust bundle/anchor needed to verify the signature over the IAK and IDevID certificates. The mechanics of this workflow are out of scope of this doc, but the trust bundle could be retrieved from a trusted vendor portal on a scheduled basis.* + - _Note: EnrollZ service must have access to the up-to-date switch vendor trust bundle/anchor needed to verify the signature over the IAK and IDevID certificates. The mechanics of this workflow are out of scope of this doc, but the trust bundle could be retrieved from a trusted vendor portal on a scheduled basis._ 4. EnrollZ service ensures that device identity fields in IAK and IDevID certs match its expectations. 5. EnrollZ service asks switch owner CA to issue an oIAK and oIDevID certs based on the IAK and IDevID pub keys, respectively. 6. EnrollZ service obtains the oIAK and oIDevID certs from the CA and calls the device's `RotateOIakCert` API to persist the oIAK and oIDevID certs on the control card. @@ -82,9 +82,9 @@ Even though it is strongly preferred to rely on ECC P521 and SHA-512 where possi 8. The switch stores oIAK and oIDevID certs in non-volatile memory and will present them in the TPM attestation `attestz` workflow. 9. The switch must use the profile created during bootz which provided the trust bundle. This will rotate the profiles cert to be the provided Owner IDevID cert and sets the Owner IAK cert. - - *Note: This implies that after successful enrollment the switch must force all its gRPC servers/services (such as `attestz` and `certz`) to respect the updated SSL profile relying on oIDevID cert. This should have already been set as part of bootz but this should again be forced at part of enrollment* - *Note: Further Rotate calls may only require the rotation of the Owner IAK cert so those messages - will not contain the `ssl_profile_id` nor `oidevid_cert` fields.* + - _Note: This implies that after successful enrollment the switch must force all its gRPC servers/services (such as `attestz` and `certz`) to respect the updated SSL profile relying on oIDevID cert. This should have already been set as part of bootz but this should again be forced at part of enrollment_ + _Note: Further Rotate calls may only require the rotation of the Owner IAK cert so those messages + will not contain the `ssl_profile_id` nor `oidevid_cert` fields._ 10. EnrollZ service repeats the workflow for the second control card if one is available. **Pros:** @@ -149,7 +149,7 @@ not perform TPM enrollment at all. - Either (1) all switch vendors have to publish EK pub on their portal and switch owner to build an automatic system (for each vendor) to fetch and persist it in advance of device shipment or (2) switch owner to obtain and manage a TPM manufacturer trust bundle to verify EK cert with which all switches must be provisioned. - Switch vendors need to support issuance of LAKs. - LAKs are also primarily used in scenarios where device/user privacy is important. In the case of network switches (especially the ones running in switch owner's own data centers), however, infra components would actually want to know exactly the identities of the switches, so their privacy is not desirable. Consult -[section 11](https://trustedcomputinggroup.org/wp-content/uploads/TPM-2p0-Keys-for-Device-Identity-and-Attestation_v1_r12_pub10082021.pdf#page=64) of the TCG spec for more details. + [section 11](https://trustedcomputinggroup.org/wp-content/uploads/TPM-2p0-Keys-for-Device-Identity-and-Attestation_v1_r12_pub10082021.pdf#page=64) of the TCG spec for more details. ##### 4. Switch owner issues LAK cert based on IAK cert signed by switch vendor CA @@ -166,7 +166,7 @@ The workflow would follow the TCG specification documented in [section 5.3](http - The most expensive approach in terms of software development/maintenance perspective as both switch vendors and switch owners will need to engineer complex TPM enrollment logic (see TCG spec for more details). - Switch vendors need to support issuance of LAKs. - LAKs are also primarily used in scenarios where device/user privacy is important. In the case of network switches (especially the ones running in switch owner's own data centers), however, infra components would actually want to know exactly the identities of the switches, so their privacy is not desirable. Consult -[section 11](https://trustedcomputinggroup.org/wp-content/uploads/TPM-2p0-Keys-for-Device-Identity-and-Attestation_v1_r12_pub10082021.pdf#page=64) of the TCG spec for more details. + [section 11](https://trustedcomputinggroup.org/wp-content/uploads/TPM-2p0-Keys-for-Device-Identity-and-Attestation_v1_r12_pub10082021.pdf#page=64) of the TCG spec for more details. ### TPM 2.0 Attestation for Switch Owners @@ -176,42 +176,53 @@ In this workflow switch owner verifies that the device's end-to-end boot state ( #### General Guidelines on What to Attest This section is out of scope of the broader openconfig initiative and instead serves more as a guideline. The general question one should ask when thinking of what to attest is "does changing X on the device change the fundamental boot posture of the device?". -If the answer is yes, then attest it, otherwise it is not required. Examples of such software that should be attested are bootloader image, OS image and secure boot policy. +If the answer is yes, then attest it, otherwise it is not required. The recommended scope of attestation measurements is from the first instruction up to, but not including, runtime. That is, the scope for attestation covers the static boot process up to and including the root filesystem (rootfs), excluding runtime. +The attestation strategy mandates a sequential measurement process, prioritizing the lowest layers first to guarantee that the initial instruction is protected and the device establishes a genuine Root of Trust. Measurements then proceed incrementally up the stack to achieve complete coverage up to and including the static root filesystem. +Based on this strategy, the specific measurements required to validate device integrity are: + +- **Boot Chain Coverage**: PCRs must cover the entire boot process, from the initial hardware boot stages up to the static operating system. +- **Filesystem Integrity**: The measurements include critical parts of the root filesystem. In general, we require a static root filesystem to be covered. +- **Security Configuration**: Secure boot configuration and policies are included in the measured state, while runtime data is excluded. Similarly, TCG discourages attesting device-specific configurations/software or things that may change after a reboot. In section [3.3.4.2](https://trustedcomputinggroup.org/wp-content/uploads/TCG_PCClient_PFP_r1p05_v23_pub.pdf#page=40) and 3.3.4.4 for PCR [1] and PCR[3] (both of which measure configuration related data) TCG spec states: -*"Entities that MUST NOT be measured as part of the above measurements: System-unique information such as asset, serial numbers, etc., as they would prevent sealing to PCR[3] with a common configuration in a fleet of devices"* and *"The event data MUST not vary across boot cycles if the set of potential PCR[1] measurements measured does not vary"*. +_"Entities that MUST NOT be measured as part of the above measurements: System-unique information such as asset, serial numbers, etc., as they would prevent sealing to PCR[3] with a common configuration in a fleet of devices"_ and _"The event data MUST not vary across boot cycles if the set of potential PCR[1] measurements measured does not vary"_. Instead of attesting such configurations, it should be software's (e.g. OS or application layer) responsibility to verify/validate such configs, while the switch owner may attest the underlying software image containing the verification logic. -Attesting secrets is an anti-pattern. Even if one is attesting password *hashes* and even if a hash has strong entropy, it is still a good practice to avoid attesting secrets or potentially-secrets-revealing data. +Attesting secrets is an antipattern. Even if one is attesting password _hashes_ and even if a hash has strong entropy, it is still a good practice to avoid attesting secrets or potentially-secrets-revealing data. This is mainly because all the attestable measurements are considered to be public and are logged in plain into the bootlog, which is intended (although not required) to be publicly shared during attestation. Finally, although the exact PCR allocation may vary across vendors, the expectation is that switch vendors will follow standardized [TCG guidance](https://trustedcomputinggroup.org/wp-content/uploads/TCG_PCClient_PFP_r1p05_v23_pub.pdf#page=36) (which measurements are measured in which PCRs, who makes those measurements, the order of measurements, etc.) for these measurements: -| **PCR Index** | **PCR Usage** | -| :--- | :---- | -| 0 | SRTM, BIOS, Host Platform Extensions, Embedded Option ROMs and PI Drivers | -| 1 | Host Platform Configuration | -| 2 | UEFI driver and application Code | -| 3 | UEFI driver and application Configuration and Data | -| 4 | UEFI Boot Manager Code (usually the MBR) and Boot Attempts | -| 5 | Boot Manager Code Configuration and Data (for use by the Boot Manager Code) and GPT/Partition Table | -| 6 | Host Platform Manufacturer Specific | -| 7 | Secure Boot Policy | -| 8-15 | Defined for use by the Static OS | -| 16 | Debug | -| 23 | Application Support | - -#### Conceptual Flow for *Offline* PCR Precomputation - -The idea is that before devices are even shipped to a switch owner, switch vendors give switch owners (through an API endpoint) final expected PCR values for those PCRs that are the same across all devices for a given product model and bootloader/OS version (PCRs that do not change between reboots and are not device-specific). Such PCR values can include measurements of BIOS image, bootloader -image, OS image, security boot policy, etc. A switch owner would simply -ingest these values and persist them in an internal DB, so that later when AttestZ service actually performs attestation, it can just compare final expected PCRs to the actual PCRs reported by the device, instead of recomputing these PCRs from the boot log for every attestation. +| **PCR Index** | **PCR Usage** | +| :------------ | :-------------------------------------------------------------------------------------------------- | +| 0 | SRTM, BIOS, Host Platform Extensions, Embedded Option ROMs and PI Drivers | +| 1 | Host Platform Configuration | +| 2 | UEFI driver and application Code | +| 3 | UEFI driver and application Configuration and Data | +| 4 | UEFI Boot Manager Code (usually the MBR) and Boot Attempts | +| 5 | Boot Manager Code Configuration and Data (for use by the Boot Manager Code) and GPT/Partition Table | +| 6 | Host Platform Manufacturer Specific | +| 7 | Secure Boot Policy | +| 8-15 | Defined for use by the Static OS | +| 16 | Debug | +| 23 | Application Support | + +#### Conceptual Flow for _Offline_ PCR Precomputation + +The core concept of offline precomputation is to optimize the attestation process by utilizing final expected PCR values provided by the vendor. Instead of recomputing PCRs from the boot log for every attestation, the AttestZ service compares the actual PCRs reported by the device against pre-ingested, expected values. +This applies specifically to PCRs that are consistent across a given product model and software version (e.g., BIOS image, bootloader image, OS image, and secure boot policy). + +To implement this workflow effectively, the following operational aspects are considered: + +- **PCR Acquisition Method**: Expected PCR values are primarily provided by the device vendor. The ideal and expected method is for these values to be delivered via a secure mechanism, such as an API endpoint or by being included within the firmware/software image bundle using the structured, cryptographically signed format defined by OpenConfig. +- **Timing**: The expected reference values are ideally obtained _before the devices are shipped to the switch owner_. They are typically acquired or updated whenever a new software/firmware image version is qualified. +- **Staging Phase**: Once acquired, the expected reference values are ingested and stored in a dedicated internal database. This system acts as the central source of truth for expected device security measurements (TPM PCR values) during the verification process. #### TPM 2.0 Attestation Workflow Steps 1. Device serves gRPC TPM 2.0 attestation endpoints on the same port as gNOI/gNSI/gNMI (9339). At this point the device must be booted with the correct OS image and with correct configurations/credentials applied. - Primary/active control card is also responsible for all RPCs directed to the secondary/standby control card. The mechanism of internal communication between the two control cards depends on the switch vendor and is out of scope of this doc. - Since the switch owner cannot directly TLS authenticate standby card, it is the responsibility of an active card to do an auth handshake with the standby card based on the IDevID key pair/cert as described in [RMA Scenario](#rma-scenario). + Since the switch owner cannot directly TLS authenticate standby card, it is the responsibility of an active card to do an auth handshake with the standby card based on the IDevID key pair/cert as described in [RMA Scenario](#rma-scenario). - Device uses active control card’s IDevID private key and oIDevID cert for securing TLS for the **initial** attestation RPCs. On successful completion of initial attestation, the device will be provisioned with switch owner’s prod credentials/certs and will rely on those for securing TLS in subsequent attestation workflows. 2. AttestZ service calls device’s `Attest` endpoint for a given control card (and a random nonce) to get back: - An oIAK cert (received by the device during the TPM enrollment workflow) signed by the switch owner’s CA. @@ -231,14 +242,14 @@ ingest these values and persist them in an internal DB, so that later when Attes - Attestation logic is simple as it boils down to just comparing final PCR hashes and does not involve PCR recomputation from the boot log. - Expected final PCR values are computed only once, for all devices and offline (before devices arrive to switch owners as opposed to on every attestation while switches are already serving production traffic). This is both efficiency and reliability gain. - The design can be extended to attest device-specific PCRs if needed. In this case switch vendors will also provide (along with final expected PCRs) a structured vendor-agnostic PCR measurement manifest object which describes how to calculate final PCRs and at the very least specifies (1) what measurements go into which PCR, (2) the order of measurements, (3) cryptographic hash algorithm used. - - *Note: For the actual manifest structure definition, we should consider getting ideas from the [attestation log-retrieval API](https://datatracker.ietf.org/doc/pdf/draft-ietf-rats-yang-tpm-charra-21#page=6) by IETF ChaRRA and re-using/expanding the design from the [Reference Integrity Manifest](https://trustedcomputinggroup.org/wp-content/uploads/TCG_RIM_Model_v1p01_r0p16_pub.pdf) by TCG. - The goal is for a switch owner, given a vendor-agnostic PCR measurement manifest (the API/object/format definition is vendor-agnostic, but the actual instance of that object is vendor-specific) and PCR measurement inputs (e.g. boot configuration), to have the ability to pre-calculate the expected final PCRs for a given device using standard TPM folding hash technique. For example:* + - _Note: For the actual manifest structure definition, we should consider getting ideas from the [attestation log-retrieval API](https://datatracker.ietf.org/doc/pdf/draft-ietf-rats-yang-tpm-charra-21#page=6) by IETF ChaRRA and re-using/expanding the design from the [Reference Integrity Manifest](https://trustedcomputinggroup.org/wp-content/uploads/TCG_RIM_Model_v1p01_r0p16_pub.pdf) by TCG. + The goal is for a switch owner, given a vendor-agnostic PCR measurement manifest (the API/object/format definition is vendor-agnostic, but the actual instance of that object is vendor-specific) and PCR measurement inputs (e.g. boot configuration), to have the ability to pre-calculate the expected final PCRs for a given device using standard TPM folding hash technique. For example:_ - ```text - PCR[5] ← 0 - PCR[5] ← Hash(PCR[5] || sha-256(“config-Y”)) - PCR[5] ← Hash(PCR[5] || sha-256(“config-Z”)) - ``` + ```text + PCR[5] ← 0 + PCR[5] ← Hash(PCR[5] || sha-256(“config-Y”)) + PCR[5] ← Hash(PCR[5] || sha-256(“config-Z”)) + ``` **Cons:** @@ -248,7 +259,7 @@ Once the attestation workflow is complete for both control cards, AttestZ servic 1. AttestZ service completes attestation workflow for all switch chassis control cards. 2. AttestZ service asks the device to generate and send back a standard signed OpenSSL Certificate Signing Request (CSR) for issuance of the mTLS credential/cert. - - *Note: in future, we can further improve the security posture of switches by sealing the private key to the TPM state. This would ensure that a device (1) has to have certain PCR final values to be able to access (unseal) the credential and (2) that only the intended device/control card can access the credential.* + - _Note: in future, we can further improve the security posture of switches by sealing the private key to the TPM state. This would ensure that a device (1) has to have certain PCR final values to be able to access (unseal) the credential and (2) that only the intended device/control card can access the credential._ 3. AttestZ service verifies CSR and sends the request to issue a mTLS cert to switch owner CA. 4. AttestZ service calls the device to persist the mTLS cert. 5. The device ensures that the cert pub key matches the one it created earlier and persists the cert in non-volatile memory. @@ -266,7 +277,7 @@ Attestation workflow with differences from the proposed approach in **bold**: 1. Device serves gRPC TPM 2.0 attestation APIs. At this point the device must be booted with the correct OS image and with correct configurations/credentials applied. - Primary/active control card is also responsible for all RPCs directed to the secondary/standby control card. The mechanism of internal communication between the two control cards depends on the switch vendor and is out of scope of this doc. - Device uses active control card’s IDevID private key and oIDevID cert for securing TLS for the initial attestation RPCs. Once the device successfully completes attestation and is provisioned with switch owner’s prod credentials/certs, the device will rely on those for securing TLS in subsequent attestation workflows. -2. AttestZ service calls device’s `Attest` endpoint for a given control card (and a random nonce) to get back *(note: the API can borrow ideas from [log-retrieval](https://datatracker.ietf.org/doc/pdf/draft-ietf-rats-yang-tpm-charra-21#page=5) and [tpm20-challenge-response-attestation](https://datatracker.ietf.org/doc/pdf/draft-ietf-rats-yang-tpm-charra-21#page=4) APIs):* +2. AttestZ service calls device’s `Attest` endpoint for a given control card (and a random nonce) to get back _(note: the API can borrow ideas from [log-retrieval](https://datatracker.ietf.org/doc/pdf/draft-ietf-rats-yang-tpm-charra-21#page=5) and [tpm20-challenge-response-attestation](https://datatracker.ietf.org/doc/pdf/draft-ietf-rats-yang-tpm-charra-21#page=4) APIs):_ - An oIAK cert signed by the switch owner’s CA and received by the device during the TPM enrollment workflow. - Final observed PCR hashes. - Quote structure and signature over it signed by IAK private. @@ -286,7 +297,7 @@ Attestation workflow with differences from the proposed approach in **bold**: - Switch owners can adapt to changes in PCR measurements (for example, if there is new artifact measured to a given PCR) on the fly. - Switch vendors do not need to host an API to give the switch owner PCR measurement manifest with every bootloader/OS release. - - *Note: this is only valid when attestation of device-specific PCRs is needed.* + - _Note: this is only valid when attestation of device-specific PCRs is needed._ - Aligns with the general TCG specification and IETF ChaRRA draft. **Cons:** @@ -335,19 +346,17 @@ This diagram highlights various use cases for different packages. ### Handy Commands -```bash -# Completely remove the entire working tree created by a Bazel instance. -bazel clean --expunge + # Completely remove the entire working tree created by a Bazel instance. + bazel clean --expunge -# Regenerate Go protobuf and gRPC client/service files. -sh regenerate-files.sh + # Regenerate Go protobuf and gRPC client/service files. + sh regenerate-files.sh -# Build all targets. -bazel build //... + # Build all targets. + bazel build //... -# Update Go dependencies in go.mod and go.sum. -go mod tidy + # Update Go dependencies in go.mod and go.sum. + go mod tidy -# Run a specific test. -go test -v ./service/biz -run TestVerifyAndParseIakAndIDevIdCerts --alsologtostderr -``` + # Run a specific test. + go test -v ./service/biz -run TestVerifyAndParseIakAndIDevIdCerts --alsologtostderr