-
Notifications
You must be signed in to change notification settings - Fork 94
test(o11y): End-to-end HTTP to Cloud Trace integration test #3822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3822 +/- ##
=======================================
Coverage 95.92% 95.92%
=======================================
Files 156 156
Lines 5991 5991
=======================================
Hits 5747 5747
Misses 244 244 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5bc4d2d to
8dbfcbd
Compare
|
This passes now (needed retry on 500's and maybe less wonky retry policy.) |
dbolduc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, but some things are strange... mainly the initial sleep and the multiple flushes.
| let root_span = tracing::info_span!("e2e_root", "otel.name" = span_name); | ||
| let trace_id = { | ||
| let _enter = root_span.enter(); | ||
| let otel_ctx = root_span.context(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL: opentelemetry::trace::TraceContextExt
| let mut trace = None; | ||
|
|
||
| for delay in backoff_delays { | ||
| println!("Waiting {}s before polling...", delay); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a comment saying why we are doing this. There is actually a good reason.
// Because we are limited by quota, start with a backoff.207fce3 to
79bc61a
Compare
dbolduc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the initial headers for the interceptor needs more thought, but maybe not for this PR.
| // Wait for the first refresh to complete. | ||
| // We ignore the result because if the sender is dropped (unlikely), | ||
| // the interceptor will just fail requests, which is the correct behavior. | ||
| let _ = rx.changed().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only send headers in the case of a success, so this hangs on bad credentials, right?
google-cloud-rust/src/integration-tests/src/observability/auth.rs
Lines 129 to 132 in 79bc61a
| Err(e) => { | |
| tracing::warn!("Failed to refresh GCP credentials: {e:?}"); | |
| sleep(ERROR_RETRY_DELAY).await; | |
| } |
So should we:
match headers {
// omitted...
Err(e) if e.is_transient() => {
tracing::warn!("Failed to refresh GCP credentials: {e:?}");
sleep(ERROR_RETRY_DELAY).await;
},
Err(e) => break,
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, seems like this can't get hit anymore:
google-cloud-rust/src/integration-tests/src/observability/auth.rs
Lines 65 to 68 in 4d8249a
| // If the first refresh hasn't completed yet, fail the request. | |
| // The OTLP exporter is expected to handle this transient failure | |
| // with its built-in retry mechanism. | |
| Status::unauthenticated("GCP credentials not yet available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting complicated enough that we should probably defer it in a separate PR.
| ); | ||
|
|
||
| // 4. Force flush to ensure spans are sent. | ||
| let _ = provider.force_flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: can we use ?. Are you trying to avoid workaround Timeout errors? https://docs.rs/opentelemetry_sdk/latest/opentelemetry_sdk/error/enum.OTelSdkError.html#variant.Timeout
…ice` and add GAX retry logic in telemetry test.
…loud Trace API verification.
…tFound` and improve error logging.
…remove manual delays in tests.
79bc61a to
566fd1d
Compare
No description provided.