[KeyManager] Add key claims retrieval to KPS and WSD#681
[KeyManager] Add key claims retrieval to KPS and WSD#681NilanjanDaw wants to merge 15 commits intogoogle:mainfrom
Conversation
adds the `KeyClaims` proto definition in `keymanager/km_common/proto/` which includes `VmAttestation`, `KeyAttestation`, and related messages for key protection and attestation. It also adds `proto/gen_keymanager.sh` to generate Go code for keymanager protos.
…for algorithms.proto Removed VM attestation related messages from key_claims.proto, they will be added in github.com/GoogleCloudPlatform/confidential-space separately Since 2 proto files are in the same directory, i changed the package name in algorithms.proto to keymanager
…ce image build This commit integrates the KeyManager Workload Service into the Confidential Space launcher and ensures it is properly built and included in the resulting images. Key changes: - Corrected the argument order in `NewServer` calls within `keymanager/workload_service/server_test.go` and `integration_test.go` to match the expected interface definitions. - Updated `launcher/cloudbuild.yaml` to install Rust, CMake, and the C++ toolchain in the `golang:1.24` build environment, enabling the `CGO_ENABLED=1` build of the launcher to successfully link the KeyManager Rust libraries via FFI. - Hooked the KeyManager Server into `launcher/container_runner.go`, configured to start listening on `kmaserver.sock` when the `EnableKeyManager` experiment is active. - Added the `EnableKeyManager` flag to the `Experiments` struct and updated relevant tests and configurations to support the new feature toggle.
1. Restricts the CGO build of KeyManager exclusively to linux/amd64 to avoid compilation errors on macOS (missing SYS_MEMFD_SECRET). 2. Updates CI workflow to only build Rust libraries on Linux x64. 3. Updates releaser workflow to install the Rust toolchain and build the KeyManager. 4. Sets CGO_ENABLED=1 in .goreleaser.yaml to ensure CGO components are included in the release.
… WSD
This commit introduces new functionality for retrieving key claims and provides
more comprehensive key management operations across both the Key Protection
Service (KPS) and Workload Service Daemon (WSD).
Key changes include:
- **KPS (key_custody_core):**
- Added FFI definitions and implementations for `key_manager_get_kem_key`.
- Implemented `GetKemKey` in `kps_key_custody_core_cgo.go` for Go-FFI bridging.
- Added integration tests for KEM key retrieval.
- **WSD (workload_service):**
- Added FFI definitions and implementations for `key_manager_get_binding_key`.
- Implemented `GetBindingKey` in `ws_key_custody_core_cgo.go` for Go-FFI bridging.
- Added integration tests for binding key retrieval.
- Introduced key claims processing mechanism:
- Added `KeyClaimsProvider` interface.
- `Server` now includes a `claimsChan` and a processClaims` goroutine.
- Implemented `GetBindingKeyClaims` and `GetKemKeyClaims` for generating protobuf claims.
- Updated server constructors and integration tests to support the new claims and interfaces.
- Added comprehensive integration tests and unit tests for the new key management
and claims functionalities in both KPS and WSD.
atulpatildbz
left a comment
There was a problem hiding this comment.
Thanks for taking this up.
Added some initial comments
| let out_kem_pubkey_slice = | ||
| unsafe { std::slice::from_raw_parts_mut(out_kem_pubkey, out_kem_pubkey_len) }; |
There was a problem hiding this comment.
we're checking the out_kem_pubkey_len value much later after we're already using it to create a slice.
from_raw_parts_mut asserts to the compiler that valid len bytes exists. And this could result in undefined behaviour.
check the length first and create the slice only after we know the length is correct.
| // Call Safe Internal Function | ||
| match get_kem_key_internal(uuid) { | ||
| Ok((kem_pubkey, binding_pubkey, delete_after)) => { | ||
| if out_kem_pubkey_len != kem_pubkey.as_bytes().len() |
There was a problem hiding this comment.
we've followed != size enforcement at many places which is fine for now since we own the cgo layer too.
But would < be a more generalized check? that way caller can pass larger buffer too if they don't know the exact size.
| ) | ||
| }; | ||
| assert_eq!(result, -1); | ||
| } |
There was a problem hiding this comment.
can we add a test test_get_kem_key_invalid_buffer_len that passes retrieved_kem_pubkey_bytes.len() + 1 or - 1 to ensure the C-API correctly rejects it with a -2 status code
| t.Fatalf("expected KEM pubkey length %d, got %d", len(pubKey), len(retrievedKemPK)) | ||
| } | ||
| for i := range pubKey { | ||
| if pubKey[i] != retrievedKemPK[i] { |
There was a problem hiding this comment.
instead of iterating and comparing, we can use bytes.Equal
import "bytes"
// ...
if !bytes.Equal(pubKey, retrievedKemPK) {
t.Fatalf("KEM pubkey mismatch: expected %x, got %x", pubKey, retrievedKemPK)
}
if !bytes.Equal(bindingPK, retrievedBindingPK) {
t.Fatalf("binding pubkey mismatch: expected %x, got %x", bindingPK, retrievedBindingPK)
}| _, _, _, err := GetKemKey(uuid.New()) | ||
| if err == nil { | ||
| t.Fatal("expected error for non-existent UUID") | ||
| } |
There was a problem hiding this comment.
can we also check the expected error string:
expectedErrMsg := "key_manager_get_kem_key failed with code -1"
if !strings.Contains(err.Error(), expectedErrMsg) {
t.Fatalf("expected error containing %q, got: %v", expectedErrMsg, err)
}if you decide to export typed sentinel errors ErrKeyNotFound then you could also use errors.Is but comparing string is also fine.
|
|
||
| // GetKemKey retrieves KEM and binding public keys and delete_after timestamp via Rust FFI. | ||
| func GetKemKey(id uuid.UUID) ([]byte, []byte, uint64, error) { | ||
| uuidBytes, err := id.MarshalBinary() |
There was a problem hiding this comment.
Since we know the size of uuid, should we create a fixed size array?
var uuidBytes [16]byte
copy(uuidBytes[:], id[:])| if len(kemPubKey) != 32 { | ||
| t.Fatalf("expected 32-byte KEM public key, got %d", len(kemPubKey)) | ||
| } | ||
| if len(bindingPubKey) != 32 { | ||
| t.Fatalf("expected 32-byte binding public key, got %d", len(bindingPubKey)) | ||
| } |
There was a problem hiding this comment.
We should have just checked the length of the keys, we should compare it against expectedKemPubKey and expectedBindingPubKey, right?
Also it would be good to have non-zero values in expectedKemPubKey and expectedBindingPubKey
something like:
for i := range expectedKemPubKey {
expectedKemPubKey[i] = byte(i + 10)
}
for i := range expectedBindingPubKey {
expectedBindingPubKey[i] = byte(i + 20)
}
// Check equality of the actual sequence of bytes
if !bytes.Equal(kemPubKey, expectedKemPubKey) {
t.Fatalf("KEM public key mismatch")
}
if !bytes.Equal(bindingPubKey, expectedBindingPubKey) {
t.Fatalf("binding public key mismatch")
}| /// ## Returns | ||
| /// * `0` on success. | ||
| /// * `-1` if arguments are invalid or key is not found. | ||
| /// * `-2` if either public key buffer is too small. |
There was a problem hiding this comment.
we're checking != so it fails with -2 even if buffer is too large
|
|
||
| let uuid = match Uuid::from_slice(uuid_slice) { | ||
| Ok(u) => u, | ||
| Err(_) => return -1, |
There was a problem hiding this comment.
is this dead code? from_raw_parts guarantees slice is exactly 16 bytes. so Uuid::from_slice will never fail
atulpatildbz
left a comment
There was a problem hiding this comment.
Added some more comments
| }; | ||
|
|
||
| assert_eq!(result, 0); | ||
| assert_eq!(generated_pubkey_bytes, retrieved_pubkey_bytes); |
There was a problem hiding this comment.
can we also destroy for cleanup?
if we don't cleanup, enumerate tests may become flaky test_key_manager_enumerate_binding_keys_success since KEY_REGISTRY sits in a global LazyLock
| } | ||
|
|
||
| func TestIntegrationGetBindingKey(t *testing.T) { | ||
| id, pubKey, err := GenerateBindingKeypair(defaultAlgo, 3600) |
There was a problem hiding this comment.
similarly here, destroy the keys when the test completes
defer func() {
_ = DestroyBindingKey(id)
}()| if len(retrievedPubKey) != len(pubKey) { | ||
| t.Fatalf("expected pubkey length %d, got %d", len(pubKey), len(retrievedPubKey)) | ||
| } | ||
|
|
||
| for i := range pubKey { | ||
| if pubKey[i] != retrievedPubKey[i] { | ||
| t.Fatalf("mismatch at index %d: expected %d, got %d", i, pubKey[i], retrievedPubKey[i]) | ||
| } | ||
| } |
There was a problem hiding this comment.
you can use bytes.Equal here
if !bytes.Equal(pubKey, retrievedPubKey) {
t.Fatalf("retrieved public key does not match generated public key.\nExpected: %x\nGot: %x", pubKey, retrievedPubKey)
}| if err != nil { | ||
| t.Fatalf("failed to create server: %v", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
| t.Cleanup(func() { | |
| close(srv.claimsChan) | |
| }) | |
| Algorithm: &keymanager.HpkeAlgorithm{ | ||
| Kem: keymanager.KemAlgorithm_KEM_ALGORITHM_DHKEM_X25519_HKDF_SHA256, | ||
| Kdf: keymanager.KdfAlgorithm_KDF_ALGORITHM_HKDF_SHA256, | ||
| Aead: keymanager.AeadAlgorithm_AEAD_ALGORITHM_AES_256_GCM, | ||
| }, |
There was a problem hiding this comment.
what are your plans when we support multiple algorithms?
| Claims: &keymanager.KeyClaims_VmBindingClaims{ | ||
| VmBindingClaims: &keymanager.KeyClaims_VmProtectionBindingClaims{ | ||
| BindingPubKey: &keymanager.HpkePublicKey{ | ||
| Algorithm: &keymanager.HpkeAlgorithm{ |
There was a problem hiding this comment.
nit: can we define a package-level variable for this
var defaultHpkeAlgo = &keymanager.HpkeAlgorithm{
Kem: keymanager.KemAlgorithm_KEM_ALGORITHM_DHKEM_X25519_HKDF_SHA256,
Kdf: keymanager.KdfAlgorithm_KDF_ALGORITHM_HKDF_SHA256,
Aead: keymanager.AeadAlgorithm_AEAD_ALGORITHM_AES_256_GCM,
}
| deleteAfter: uint64(time.Now().Add(1 * time.Hour).Unix()), | ||
| } | ||
|
|
||
| srv, err := NewServer(kps, ws, filepath.Join(t.TempDir(), "test.sock")) |
There was a problem hiding this comment.
can you add cleanup?
t.Cleanup(func() { close(srv.claimsChan) })
| } | ||
|
|
||
| t.Run("BindingClaims", func(t *testing.T) { | ||
| respChan := make(chan *ClaimsResult) |
There was a problem hiding this comment.
let's use buffered channel.
the test fails for an unrelated reason before reaching the select block, it could deadlock the test suite.
| respChan := make(chan *ClaimsResult) | |
| respChan := make(chan *ClaimsResult, 1) |
| if err != nil { | ||
| t.Fatalf("failed to create test server: %v", err) | ||
| } | ||
| return srv |
There was a problem hiding this comment.
actually this might be a good place to add t.Cleanup
t.Cleanup(func() {
srv.listener.Close() // Or srv.Shutdown(context.Background())
close(srv.claimsChan) // Stop the background processClaims goroutine
})
This commit introduces new functionality for retrieving key claims and provides
more comprehensive key management operations across both the Key Protection
Service (KPS) and Workload Service Daemon (WSD).
Key changes include:
key_manager_get_kem_key.GetKemKeyinkps_key_custody_core_cgo.gofor Go-FFI bridging.key_manager_get_binding_key.GetBindingKeyinws_key_custody_core_cgo.gofor Go-FFI bridging.KeyClaimsProviderinterface.Servernow includes aclaimsChanand a processClaims` goroutine.GetBindingKeyClaimsandGetKemKeyClaimsfor generating protobuf claims.and claims functionalities in both KPS and WSD.
This PR depends on #671. Please review it before this one.