Skip to content

Commit 38b2666

Browse files
authored
feat: add tracing framework (#737)
1 parent 8a43009 commit 38b2666

File tree

49 files changed

+1268
-274
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1268
-274
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"id": "982d60f1-9742-45c3-8de2-5feef6541d76",
3+
"type": "feature",
4+
"description": "Add a new tracing framework for centralized handling of log messages and metric events and providing easy integration points for connecting to downstream tracing systems (e.g., kotlin-logging)",
5+
"issues": [
6+
"awslabs/smithy-kotlin#677"
7+
]
8+
}

docs/design/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,5 @@ Start here for an overview:
1919
* [Nullable properties in SDK domain types](nullable-properties-in-sdk-domain-types.md) (coming soon)
2020
* [Pagination](paginators.md)
2121
* [Retries](retries.md)
22+
* [Tracing](tracing.md)
2223
* [Waiters](waiters.md)

docs/design/tracing.md

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,198 @@
1+
# Tracing Design
2+
3+
* **Type**: Design
4+
* **Author(s)**: Ian Botsford
5+
6+
# Abstract
7+
8+
Tracing describes the emission of logging and metric events in a structured manner for the purposes of analyzing SDK
9+
performance and debugging. This document presents a design for how tracing will work in the SDK.
10+
11+
# Concepts
12+
13+
The following terms are defined:
14+
15+
* **Trace span**: A logical grouping of tracing events that encompasses some operation. Trace spans may be subdivided
16+
into child spans which group a narrower set of events within the context of the parent. Trace spans are hierarchical;
17+
events that occur within one span also logically occur within the ancestors of that span.
18+
19+
* **Trace probe**: A receiver for tracing events. A probe is notified when new events occur within a span and may take
20+
appropriate action to route the event (e.g., forward to a downstream logging/metrics framework, print to the console,
21+
write to a file, etc.).
22+
23+
# Design
24+
25+
The following components provide tracing support:
26+
27+
## Tracer
28+
29+
A `Tracer` is a top-level provider of tracing capabilities. It bridges trace spans (into which events are emitted) and
30+
trace probes (which receive events and handle them accordingly). Typically, each service client will have its own
31+
internal `Tracer` instance. That `Tracer` need not be publicly accessible but must be configurable with a trace probe
32+
and client name at service client construction.
33+
34+
The `Tracer` interface is specified as:
35+
36+
```kotlin
37+
interface Tracer {
38+
fun createRootSpan(id: String): TraceSpan
39+
}
40+
```
41+
42+
A `Tracer` provides root spans for a service client, into which all events over the lifetime of an operation will be
43+
emitted. Child spans can be created as mentioned below in the [Trace Span](#trace-span) section.
44+
45+
**Note**: The interface does not specify how trace probes will be configured or utilized. These are implementation
46+
details of the tracer and aren't necessary in the public interface.
47+
48+
## Trace Span
49+
50+
A `TraceSpan` is a logical grouping of tracing events that are associated with some operation. Spans may be subdivided
51+
into child spans which group a narrower set of events within the context of the parent. Spans are hierarchical; events
52+
that occur within one span also logically occur within the ancestors of that span.
53+
54+
The `TraceSpan` interface is specified as:
55+
56+
```kotlin
57+
interface TraceSpan : Closeable {
58+
val id: String
59+
val parent: TraceSpan?
60+
61+
fun child(id: String): TraceSpan
62+
fun postEvent(event: TraceEvent)
63+
}
64+
```
65+
66+
Spans have an ID (or name) which must be unique among sibling spans within the same parent. Span IDs will generally be
67+
used by probes to contextualize events.
68+
69+
`TraceSpan` instances are `Closeable` and must be closed when no more events will be emitted to them. Probes may choose
70+
to batch/aggregate events within a span until a span is closed.
71+
72+
## Trace Event
73+
74+
A `TraceEvent` is the recording of a single event that took place and its associated metadata:
75+
76+
```kotlin
77+
data class TraceEvent(
78+
val level: EventLevel,
79+
val sourceComponent: String,
80+
val timestamp: Instant,
81+
val threadId: String,
82+
val data: TraceEventData,
83+
)
84+
85+
enum class EventLevel { Fatal, Error, Warning, Info, Debug, Trace }
86+
87+
sealed interface TraceEventData {
88+
data class Message(val exception: Throwable? = null, val content: () -> Any?) : TraceEventData
89+
90+
sealed interface Metric : TraceEventData { val metric: String }
91+
data class Count<T : Number>(override val metric: String, val count: () -> T) : Metric
92+
data class Timespan(override val metric: String, val duration: () -> Duration) : Metric
93+
}
94+
```
95+
96+
Trace events occur at different levels (e.g., fatal, info, debug, etc.). These levels may be used by probes to
97+
include/omit events in their output.
98+
99+
Trace events can be one of three types:
100+
* `Message`: Typically, a free-form text message used for logging
101+
* `Count`: The numerical measurement of some value (e.g., results returned, bytes written, etc.)
102+
* `Timespan`: The temporal measurement of some occurrence (e.g., time elapsed, latency, etc.)
103+
104+
Probes are free to handle these different types of events however they see fit (e.g., they may log some messages,
105+
aggregate some metrics, ignore some events, etc.).
106+
107+
Event data values (i.e., message text, count values, and timespan durations) are provided as lambdas rather than with
108+
direct values. This allows probe implementations to skip calculating them in the event they would otherwise be discarded
109+
(e.g., for events emitted at a level ignored by the probe).
110+
111+
## Trace Probe
112+
113+
A `TraceProbe` is a sink for receiving events from spans. They will typically form a bridge between the SDK's events and
114+
downstream libraries/frameworks/services which can handle the events. Examples of such downstream systems include Log4j,
115+
CloudWatch, local files on disk, the console, etc.
116+
117+
SDKs will typically not bundle many implementations of `TraceProbe` themselves. Common probe implementations may be
118+
available as separate libraries or from third-party sources. Users may implement probes themselves to bridge SDK events
119+
to whatever downstream systems they desire.
120+
121+
The `TraceProbe` interface is defined as:
122+
123+
```kotlin
124+
interface TraceProbe {
125+
fun postEvent(span: TraceSpan, event: TraceEvent)
126+
fun spanClosed(span: TraceSpan)
127+
}
128+
```
129+
130+
The methods of `TraceProbe` are invoked by the top-level `Tracer` (or `TraceSpan` instances created by it).
131+
132+
The `postEvent` method indicates that an event has been emitted to a span. Probe implementations may choose to
133+
immediately handle/discard events or to batch them until later. Once `spanClosed` is called, no more events will be
134+
posted for the given span.
135+
136+
## Client config
137+
138+
The following additional parameters will be added to client config:
139+
140+
* `tracer`: An optional `Tracer` implementation to use. If not provided explicitly, this will default to a tracer which
141+
sends logging events to **kotlin-logging** and ignores metric events. The `DefaultTracer` class is available to
142+
provide a simple `Tracer` implementation with a configurable probe and root prefix. Using a root prefix can help
143+
differentiate events from multiple clients of a single service used for different use cases.
144+
145+
# Implementation guidance
146+
147+
The following guidelines are intended to inform implementation and usage of tracing features by SDK contributors and
148+
those who customize their usage of the SDK:
149+
150+
## Trace span hierarchy
151+
152+
Trace spans form a taxonomy that categorize tracing events into a hierarchy. Discrete spans help group related events in
153+
a way that's useful to downstream tools which facilitate analysis. Consequently, choosing meaningful trace spans is key
154+
to maximizing the usefulness of tracing events. Trace spans which are too specific and too deeply nested may create
155+
noise and obscure events in an opaque hierarchy. Trace spans which are too shallow may bundle together too many events
156+
and hinder meaningful analyses by downstream systems.
157+
158+
The following trace span levels are recommended for implementors:
159+
160+
* A top-level span for each operation invocation, in the form of `<clientName>-<operation>-<uuid>` (e.g.,
161+
`S3-ListBuckets-8e6bf409-c119-4661-bd99-523c70701aac`)
162+
* A span for retry attempts, in the form of `Attempt-<n>` (e.g., `"Attempt-1"`) or `Non-retryable attempt` in the case
163+
of operations which cannot be retried
164+
* A span for credentials chains, named `Credentials chain`. Note that individual credentials providers (e.g., static,
165+
profile, environment, etc.) don't get their own child spans—only the chain.
166+
* A span for HTTP engine events within a request, named `HTTP`
167+
* Spans for subclients (or _inner clients_) which are embedded in the logic for superclients (or _outer clients_). An
168+
example of a subclient is using a nested STS client as part of credential resolution while invoking an operation for a
169+
different service. Spans for subclients effectively reproduce the span hierarchy listed above nested within the outer
170+
span hierarchy.
171+
172+
The following are examples of suggested trace span hierarchies:
173+
174+
* `S3-ListBuckets-8e6bf409-c119-4661-bd99-523c70701aac`: events which occur during invocation of an S3 `ListBuckets`
175+
operation _before_ or _after_ retry middleware (e.g., serialization/deserialization, endpoint resolution, etc.)
176+
* `S3-ListBuckets-8e6bf409-c119-4661-bd99-523c70701aac/Attempt-1`: events which occur during the first attempt at
177+
calling `ListBuckets` _outside of_ the HTTP engine (e.g., signing)
178+
* `S3-ListBuckets-8e6bf409-c119-4661-bd99-523c70701aac/Attempt-1/Credentials chain`: events which occur during
179+
credential resolution in a credentials chain during the first attempt at calling `ListBuckets`
180+
* `S3-ListBuckets-8e6bf409-c119-4661-bd99-523c70701aac/Attempt-1/HTTP`: events which occur inside the HTTP engine during
181+
the first attempt at calling `ListBuckets` (e.g., sending/receiving bytes from service)
182+
183+
The following is an example of a nested span hierarchy for a subclient:
184+
185+
* `S3-ListBuckets-8e6bf409-c119-4661-bd99-523c70701aac/Attempt-1/Credentials chain/SSO-AssumeRole-c080b2e2-ff6d-4504-bce4-3433f9f4ac1b/Attempt-2`:
186+
events which occur during the second attempt to call SSO's `AssumeRole` as part of credential chain resolution during
187+
the first attempt to call S3's `ListBuckets`.
188+
189+
### Adding new spans
190+
191+
New spans may be necessary for certain features in the future and thus the above list and examples are not exhaustive.
192+
For the reasons described above, care should be taken to ensure that new spans add enough value and distinctiveness
193+
without nesting so deeply as to obscure event relationships.
194+
195+
# Revision history
196+
197+
* 8/19/2022 - Initial draft
198+
* 11/15/2022 - Revised draft with latest proposed interfaces

runtime/auth/aws-signing-common/common/src/aws/smithy/kotlin/runtime/auth/awssigning/middleware/AwsSigningMiddleware.kt

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,10 @@ import aws.smithy.kotlin.runtime.http.HttpBody
1010
import aws.smithy.kotlin.runtime.http.operation.*
1111
import aws.smithy.kotlin.runtime.http.request.HttpRequest
1212
import aws.smithy.kotlin.runtime.http.request.HttpRequestBuilder
13+
import aws.smithy.kotlin.runtime.tracing.warn
1314
import aws.smithy.kotlin.runtime.util.InternalApi
1415
import aws.smithy.kotlin.runtime.util.get
16+
import kotlin.coroutines.coroutineContext
1517
import kotlin.time.Duration
1618

1719
/**
@@ -101,7 +103,6 @@ public class AwsSigningMiddleware(private val config: Config) : ModifyRequestMid
101103
}
102104

103105
override suspend fun modifyRequest(req: SdkHttpRequest): SdkHttpRequest {
104-
val logger = req.context.getLogger(this::class.simpleName!!)
105106
val body = req.subject.body
106107

107108
// favor attributes from the current request context
@@ -147,7 +148,9 @@ public class AwsSigningMiddleware(private val config: Config) : ModifyRequestMid
147148
config.isUnsignedPayload -> HashSpecification.UnsignedPayload
148149
body is HttpBody.Empty -> HashSpecification.EmptyBody
149150
body is HttpBody.Streaming && !body.isReplayable -> {
150-
logger.warn { "unable to compute hash for unbounded stream; defaulting to unsigned payload" }
151+
coroutineContext.warn<AwsSigningMiddleware> {
152+
"unable to compute hash for unbounded stream; defaulting to unsigned payload"
153+
}
151154
HashSpecification.UnsignedPayload
152155
}
153156
// use the payload to compute the hash

runtime/auth/aws-signing-crt/common/src/aws/smithy/kotlin/runtime/auth/awssigning/crt/CrtAwsSigner.kt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import aws.smithy.kotlin.runtime.http.request.HttpRequestBuilder
1515
import aws.smithy.kotlin.runtime.http.request.toBuilder
1616
import aws.smithy.kotlin.runtime.http.util.fullUriToQueryParameters
1717
import aws.smithy.kotlin.runtime.time.epochMilliseconds
18+
import aws.smithy.kotlin.runtime.tracing.debug
1819
import kotlin.coroutines.coroutineContext
1920
import aws.sdk.kotlin.crt.auth.credentials.Credentials as CrtCredentials
2021
import aws.sdk.kotlin.crt.auth.signing.AwsSignatureType as CrtSignatureType
@@ -31,6 +32,8 @@ public object CrtAwsSigner : AwsSigner {
3132
val crtConfig = config.toCrtSigningConfig()
3233

3334
val crtResult = CrtSigner.sign(crtRequest, crtConfig)
35+
coroutineContext.debug<CrtAwsSigner> { "Calculated signature: ${crtResult.signature.decodeToString()}" }
36+
3437
val crtSignedResult = checkNotNull(crtResult.signedRequest) { "Signed request unexpectedly null" }
3538

3639
val requestBuilder = request.toBuilder()
@@ -45,6 +48,8 @@ public object CrtAwsSigner : AwsSigner {
4548
): AwsSigningResult<Unit> {
4649
val crtConfig = config.toCrtSigningConfig()
4750
val crtResult = CrtSigner.signChunk(chunkBody, prevSignature, crtConfig)
51+
coroutineContext.debug<CrtAwsSigner> { "Calculated signature: ${crtResult.signature.decodeToString()}" }
52+
4853
return AwsSigningResult(Unit, crtResult.signature)
4954
}
5055
}

runtime/auth/aws-signing-default/common/src/aws/smithy/kotlin/runtime/auth/awssigning/DefaultAwsSigner.kt

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,10 @@
55
package aws.smithy.kotlin.runtime.auth.awssigning
66

77
import aws.smithy.kotlin.runtime.auth.awscredentials.Credentials
8+
import aws.smithy.kotlin.runtime.http.operation.getLogger
89
import aws.smithy.kotlin.runtime.http.request.HttpRequest
9-
import aws.smithy.kotlin.runtime.logging.Logger
1010
import aws.smithy.kotlin.runtime.time.TimestampFormat
11+
import kotlin.coroutines.coroutineContext
1112

1213
/** The default implementation of [AwsSigner] */
1314
public val DefaultAwsSigner: AwsSigner = DefaultAwsSignerImpl()
@@ -17,9 +18,9 @@ internal class DefaultAwsSignerImpl(
1718
private val signatureCalculator: SignatureCalculator = SignatureCalculator.Default,
1819
private val requestMutator: RequestMutator = RequestMutator.Default,
1920
) : AwsSigner {
20-
private val logger = Logger.getLogger<DefaultAwsSignerImpl>()
21-
2221
override suspend fun sign(request: HttpRequest, config: AwsSigningConfig): AwsSigningResult<HttpRequest> {
22+
val logger = coroutineContext.getLogger<DefaultAwsSignerImpl>()
23+
2324
// TODO implement SigV4a
2425
require(config.algorithm == AwsSigningAlgorithm.SIGV4) { "${config.algorithm} support is not yet implemented" }
2526

@@ -46,6 +47,8 @@ internal class DefaultAwsSignerImpl(
4647
prevSignature: ByteArray,
4748
config: AwsSigningConfig,
4849
): AwsSigningResult<Unit> {
50+
val logger = coroutineContext.getLogger<DefaultAwsSignerImpl>()
51+
4952
val stringToSign = signatureCalculator.chunkStringToSign(chunkBody, prevSignature, config)
5053
logger.trace { "Chunk string to sign:\n$stringToSign" }
5154

runtime/logging/common/src/aws/smithy/kotlin/runtime/logging/KotlinLoggingAdapter.kt

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,4 @@ internal class KotlinLoggingAdapter(name: String) : Logger {
5252
override fun error(t: Throwable?, msg: () -> Any?) {
5353
log.error(t, msg)
5454
}
55-
56-
override fun <T : Throwable> throwing(throwable: T): T = log.throwing(throwable)
57-
58-
override fun <T : Throwable> catching(throwable: T) {
59-
log.catching(throwable)
60-
}
6155
}

runtime/logging/common/src/aws/smithy/kotlin/runtime/logging/Logger.kt

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -74,16 +74,6 @@ public interface Logger {
7474
* Lazy add a log message with throwable payload if error logging is enabled
7575
*/
7676
public fun error(t: Throwable?, msg: () -> Any?)
77-
78-
/**
79-
* Add a log message indicating an exception will be thrown along with the stack trace.
80-
*/
81-
public fun <T : Throwable> throwing(throwable: T): T
82-
83-
/**
84-
* Add a log message indicating an exception is caught along with the stack trace.
85-
*/
86-
public fun <T : Throwable> catching(throwable: T)
8777
}
8878

8979
/**

runtime/logging/common/src/aws/smithy/kotlin/runtime/logging/LoggerExt.kt

Lines changed: 0 additions & 45 deletions
This file was deleted.

runtime/protocol/http-client-engines/http-client-engine-crt/build.gradle.kts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ kotlin {
3030
api(project(":runtime:protocol:http"))
3131
implementation(project(":runtime:logging"))
3232
implementation(project(":runtime:crt-util"))
33+
implementation(project(":runtime:tracing:tracing-core"))
3334

3435
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:$coroutinesVersion")
3536
implementation("org.jetbrains.kotlinx:atomicfu:$atomicFuVersion")

0 commit comments

Comments
 (0)