Conversation
| metrics. Customers can then ask for updates to the implementations | ||
| CT provides or customers can go an implement their own interfaces that are fine-tuned | ||
| to their use cases. | ||
|
|
There was a problem hiding this comment.
Is there a middle ground? For example, logging information locally without uploading it anywhere.
|
@josecorella I have a meta/macro question. I appreciate that the Background doc highlights issues and alternatives, but I feel like we a missing a "User Stories" document, that can be used to measure success criteria and what are the table stakes of this work. It is also possible I just missed such a proposal doc; but without it, it is difficult to work backwards. |
framework/metrics-agent.md
Outdated
| operation AddDate { | ||
| input: AddDateInput, | ||
| output: AddOutput, | ||
| errors: [MetricsPutError] |
There was a problem hiding this comment.
Nit: You say that the interface should not error, but you have errors here.
| // Common output structure | ||
| structure AddOutput {} |
There was a problem hiding this comment.
I get why you might optimize this. But is this really the best choice? why not have a output per operation?
There was a problem hiding this comment.
revisit this in discussion on 2026-02-04
|
Potential Issue/Alternative/User Story: Users of the MPL/ESDK/DB-ESDK are also users of the AWS SDKs. The AWS SDKs have established logging and metric interfaces. There likely is an implicit customer expectation that Crypto Tools products behave and appear to be consistent with the AWS SDKs. Therefore, I suggest we carefully evaluate if we can utilize the SDKs metric and logging tooling, and offer a customer experience that closely mimics the SDKs experience. The current collection of docs does not state this as a goal, but it does leave it open as an opportunity. i.e: the proposed metric interface could wrap an SDK metric class. |
| Client->>Client: Content Encryption | ||
| end | ||
| Client<<->>CMM: GetEncryption/Decryption Materials | ||
| CMM<<->>Keyring: OnEncrypt/OnDecrypt |
There was a problem hiding this comment.
I think the doc implies this, but we can go deeper down the stack.
i.e: Keyring <<->> KMS or Keyring <<->> Branch Key Store.
Technically, the H-Keyring is:
Keyring <<->> (Cache | Branch Key Store <<->> (DDB & KMS))
There was a problem hiding this comment.
I agree that we could go dive deeper; however, in the interest of this change I don't want to go there besides painting the most basic model for the reader.
There was a problem hiding this comment.
But we need the CMC here.
| - Metrics: Throughout this document and other related documents the word, "metrics" is used extensively. | ||
| For Crypto Tools' libraries metrics means two things. | ||
|
|
||
| 1. Measuring application performance, (e.g. api requests, cache performance, latency). |
There was a problem hiding this comment.
nit: this doesn't render as a list, it renders as a code block
There was a problem hiding this comment.
vscode render preview lied to me....
There was a problem hiding this comment.
I disagree about implementing this in Dafny.
I am concerned that putting additional synchronization blocks into our code base will negatively impact performance, and I do not see how a metric interface could exist in Dafny without a synchronization block.
There was a problem hiding this comment.
What do you mean by "synchronization blocks"?
There was a problem hiding this comment.
I don't think we'd need sync blocks - why can't externs fire (a new thread) and forget?
| versions of these libraries have no logging or metrics publishing | ||
| to either a local application or to an observability service like AWS CloudWatch. | ||
|
|
||
| As client side encryption libraries emitting metrics must be done carefully as |
There was a problem hiding this comment.
| As client side encryption libraries emitting metrics must be done carefully as | |
| As client side encryption libraries, emitting metrics must be done carefully as |
There was a problem hiding this comment.
I wonder about errors. How should or to what extent should errors show up? Can we log the stacktrace or is that an anti-pattern?
| | ESDK | T.B.D | [ESDK.smithy](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy) | | ||
| | MPL | T.B.D | [material-provider.smithy](https://github.com/aws/aws-cryptographic-material-providers-library/blob/main/AwsCryptographicMaterialProviders/dafny/AwsCryptographicMaterialProviders/Model/material-provider.smithy) | | ||
| | DB-ESDK | T.B.D | [DynamoDbEncryption.smithy](https://github.com/aws/aws-database-encryption-sdk-dynamodb/blob/main/DynamoDbEncryption/dafny/DynamoDbEncryption/Model/DynamoDbEncryption.smithy) | | ||
|
|
| A popular feature request has been for in depth insights into CT libraries. Many customers | ||
| ask for suggestions on how to reduce network calls to AWS Key Management Service (AWS KMS) and | ||
| followup questions around cache performance. |
There was a problem hiding this comment.
Issue: "has been for" is rough phrase.
Ideally,
we would quantify the customer demand (x internal services, y external customers) but that takes time and is not worth it;
we all know that the demand is there.
| A popular feature request has been for in depth insights into CT libraries. Many customers | |
| ask for suggestions on how to reduce network calls to AWS Key Management Service (AWS KMS) and | |
| followup questions around cache performance. | |
| There is customer demand for in depth insights into CT libraries. Many customers | |
| ask for suggestions on how to reduce network calls to AWS Key Management Service (AWS KMS) and | |
| followup questions around cache performance. |
|
|
||
| ### Issue 2: Should Data Plane APIs fail if metrics fail to publish? | ||
|
|
||
| #### No (recommended) |
There was a problem hiding this comment.
I might want to at-least throw a warning?
|
|
||
| ## Requirements | ||
|
|
||
| The interface should have three requirements. |
|
|
||
| This allows customers to test how their applications behave when they start to emit | ||
| metrics. Customers can then ask for updates to the implementations | ||
| CT provides or customers can go an implement their own interfaces that are fine-tuned |
There was a problem hiding this comment.
| CT provides or customers can go an implement their own interfaces that are fine-tuned | |
| CT provides or customers can go and implement their own interfaces that are fine-tuned |
| metrics to this one worker and to only sometimes capture metrics to this | ||
| other worker. | ||
|
|
||
| #### No (recommended) |
There was a problem hiding this comment.
I'd say yes to allow metrics on client construction. It's painful (and easy to forget) to supply the agent on every call. We should keep it optional, but should provide a way to set once and forget.
| this._bufferSize = builder._bufferSize; | ||
| this._instructionFileConfig = builder._instructionFileConfig; | ||
| this._commitmentPolicy = builder._commitmentPolicy; | ||
| + this._metricsWorker = builder._metricsWorkerl |
There was a problem hiding this comment.
| + this._metricsWorker = builder._metricsWorkerl | |
| + this._metricsWorker = builder._metricsWorker; |
|
|
||
| ## Requirements | ||
|
|
||
| The interface should have three requirements. |
There was a problem hiding this comment.
| The interface should have three requirements. | |
| The interface should have two requirements. |
| | ESDK | T.B.D | [ESDK.smithy](https://github.com/aws/aws-encryption-sdk/blob/mainline/AwsEncryptionSDK/dafny/AwsEncryptionSdk/Model/esdk.smithy) | | ||
| | MPL | T.B.D | [material-provider.smithy](https://github.com/aws/aws-cryptographic-material-providers-library/blob/main/AwsCryptographicMaterialProviders/dafny/AwsCryptographicMaterialProviders/Model/material-provider.smithy) | | ||
| | DB-ESDK | T.B.D | [DynamoDbEncryption.smithy](https://github.com/aws/aws-database-encryption-sdk-dynamodb/blob/main/DynamoDbEncryption/dafny/DynamoDbEncryption/Model/DynamoDbEncryption.smithy) | | ||
|
|
|
|
||
| ### count | ||
|
|
||
| A count is a long value |
There was a problem hiding this comment.
| A count is a long value | |
| A count is a long value. |
| of the issues that are described above, (e.g. handling failing requests, perform | ||
| blocking requests to CT libraries, use a separate thread/thread pool that handles | ||
| these request). By providing a wrapper around a language's most popular logging |
There was a problem hiding this comment.
Issue:
Agreement that Metrics Agent Interface and Implementation will be implemented in Dafny but only as wrappers and provide extern implementations to make moving off of Dafny easier.
I am concerned that Dafny will not allow for non-blocking requests as it does not have async syntax, nor does it have concurrent syntax.
| This list is not exhaustive. Any downstream consumer of any API or client configuration SHOULD | ||
| also be updated as part of this proposed changed. | ||
|
|
||
| | API/ Configuration | |
There was a problem hiding this comment.
Issue: This list is missing the Cryptographic Materials Cache's APIs/Configuration.
I strongly believe that we should offer insights into cache hits/misses, and therefore need to support the CMC's operations.
| @required | ||
| materials: EncryptionMaterials, | ||
|
|
||
| + metricsWorker: aws.cryptography.materialProviders#MetricsWorkerReference |
There was a problem hiding this comment.
I wonder if this should be a Metrics instance rather than the worker?
| @extendable | ||
| resource MetricsWorker { | ||
| operations: [ | ||
| AddDate, |
There was a problem hiding this comment.
How do we integrate these with something like CoralMetricsWorker?
| // Common output structure | ||
| structure AddOutput {} | ||
|
|
||
| @aws.polymorph#reference(resource: MetricsWorker) |
There was a problem hiding this comment.
Do we want to use these custom traits given that we are planning to move away from custom smithy-dafny?
| transactionId: String | ||
| } | ||
|
|
||
| // Common output structure |
There was a problem hiding this comment.
In at least Coral Metrics and SLF4J (not sure about other languages) all of these API operations return void. I figure we're doing this for future extensibility, but it might be a little annoying for implementors, as now they need to return something instead of void. This could setup NPEs way down the line.
For best reading and commenting experience, I suggest splitting your window in two; the review page and the rendered page.
Here are the rendered files:
Goals for 9-15-2025 Spec Review:
NOTE: name change from metrics-agent -> metrics-worker
Goals for 2-4-2025 Spec Review:
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Check any applicable: