Skip to content

Conversation

UOndro
Copy link

@UOndro UOndro commented Dec 14, 2023

Description

Currently, the user was not able to cancel the stream. The cause was that instead of returning _MultiThreadedRendezvous we returned the generator created from it.

I tried two options to fix this:

  1. In this WIP commit , I tried to end span by using add_done_callback. This kinda worked, but it was very hard to test using unittest. In tests, we either had to wait till thread handling events were already called before asserts or we could use the experimental single-threaded channel. Another problem was that the single-threaded channel did not handle the done callback when the stream was canceled. It handled correctly done callback when the stream ended or in case of error, but not during cancel. I think this is just a bug in that single-threaded implementation.
  2. The second option(and the one I picked) was wrapping _MultiThreadedRendezvous in proxy, and ending span on all places where it was needed. Because this happened in the same thread, I did not have problems with unit testing.

The first solution is definitely less complicated from the point of implementation. However, I am not even sure if waiting for channel_spin is correct, spans in theory can be ended a little bit later. Compared to the second solution, we don't need to handle every place where the stream can end, we basically leave this to grpc implementation.

In the end, I picked the second solution mostly because it can be testable, it ends span almost immediately when the stream ends, and also because of that bug when done callback is not called when using the single threaded channel.

If you prefer the first solution or have some other in mind, let me know.

Fixes #2014

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • test_unary_stream_can_be_cancel - cancel unary stream in the middle of consuming responses
  • test_stream_stream_can_be_cancel - cancel stream stream in middle of consuming responses
  • test_finished_stream_cancel_does_not_change_status_of_span - try to cancel stream that already finished

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Copy link

linux-foundation-easycla bot commented Dec 14, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@xrmx
Copy link
Contributor

xrmx commented May 10, 2024

@UOndro Please fix failing tests, linting and add a CHANGELOG entry

- set_attribute and also set_status are not called if span was already ended. Simple reorder of function calls helped
- I also realized that accessing end_time is dangerous, in SDK it is protected by lock. I solved this by remembering if we already called end. Calling multiple times end on span is not problematic, but it generates warning logs and is probably not good practice
- updated CHANGELOG.md
- fix lint
@UOndro UOndro requested a review from a team May 10, 2024 15:43
@UOndro
Copy link
Author

UOndro commented May 10, 2024

Hi @xrmx, It should be fixed, also updated changelog

@aabmass
Copy link
Member

aabmass commented Jun 27, 2024

@UOndro thank you for the PR and apologies for not taking a look sooner.

  1. In this WIP commit , I tried to end span by using add_done_callback. This kinda worked, but it was very hard to test using unittest.

I think the add_done_callback() approach would be much better than the __del__() approach. A few issues with the finalizer approach

  • I'm not sure the timing will be very accurate. I don't think Python implementations are guaranteed to GC the object immediately when refcount goes to zero. So the span could remain un-ended for a while. The opposite could happen as well where the wrapt proxy gets garbage collected immediately once refcount is zero and we end the span, but in reality the Call may remain open for some time
  • gRPC streaming calls also return a status code which we could add to the span. It's accessible through the future

but it was very hard to test using unittest. In tests, we either had to wait till thread handling events were already called before asserts or we could use the experimental single-threaded channel.

Can we call result() or code() on the future object? I think those should block until the Call is actually finished. If that still doesn't work, I imagine you could do another add_done_callback() and manage a concurrent.futures.Future separately through that.

@aabmass
Copy link
Member

aabmass commented Jun 27, 2024

  1. In this WIP commit , I tried to end span by using add_done_callback. This kinda worked, but it was very hard to test using unittest.

@UOndro I got the tests working on top of this commit by doing responses.code() 64ae80a

@UOndro
Copy link
Author

UOndro commented Jun 28, 2024

@UOndro I got the tests working on top of this commit by doing responses.code() 64ae80a

If I comment out the shutdown of grpc server, the test still fails. The reason is that done callback is called after the code is set and after the call is finished.

Not sure if we want to close the server before getting spans, it doesn't look right.

If that still doesn't work, I imagine you could do another add_done_callback() and manage a concurrent.futures.Future separately through that.

Can you maybe explain this? I am not sure what to do.

Ideally, we would have some function/future that would tell us that the done callback was called. I could add something to OpenTelemetryClientInterceptor, but not sure how to get it/call it from tests. Maybe saving it to GrpcInstrumentorClient, that seems like a lot of work. Any ideas?

Here is the code where I commented out the stop of server -> 28b5686

cc. @aabmass

@aidandj
Copy link

aidandj commented Oct 9, 2025

I fixed the tests and brought this up to main here: aidandj@d7900f9

Should I create a new PR? Or a PR onto this branch? Seems like this is close enough it should be pushed through.

@aidandj
Copy link

aidandj commented Oct 9, 2025

Updated PR here if that's prefered: #3823

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Reviewed PRs that need fixes

Development

Successfully merging this pull request may close these issues.

GRPC client_interceptor does not allow the user to cancel streamed responses

4 participants