Skip to content

Conversation

SrdjanLL
Copy link
Contributor

@SrdjanLL SrdjanLL commented Apr 21, 2025

What does this pull request do?

  • Resolves the issue of missing openai instrumentation surface in our "dogfood" chatbot rag app.
    • Solution inspired by amazing deep dive from @xrmx , with some minor additions to resolve the issue for async calls too.
  • UpdateStreamWrapper to close the span only once, either when context manager exits or when end of stream is reached
    • Same logic is applied for both sync and async

Testing

  1. After reproducing the issue following these steps, I applied the patch mentioned in the same link.
  2. Confirmed the patch resolves the issue
  3. Wrote unit tests for chat_stream* methods for cases when context manager is used and confirmed that without a patch these tests fail and issue WARNING opentelemetry.sdk.trace:__init__.py:943 Calling end() on an ended span.
  4. Applied the patch and expanded it to cover the async scenarios too.
  5. Confirm unit tests are passing.
image

Related issues

Next steps

  1. Release 1.1.0 of openai instrumentation after the PR is merged.
  2. Update our chatbot-rag-app to use the newest version.

self.choices = []
self.usage = None
self.service_tier = None
self.in_context_manager = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead of this add self.ended and just guard both calls by it? I think it's better to end when the stream is finished when possible and only use the context manager hook as a fallback since the context manager could potentially be left open for a very long time even for unrelated processing depending on user code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion - this was a good catch and a pointer whilst ramping up. And yeah, seems better as it avoids creating misleading spans. I'll update this and move the guarding to the def end(...) given it's called in many places as you surfaced elsewhere.

self.process_chunk(chunk)
yield chunk
except Exception as exc:
self.end(exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to guard this similarly

self.process_chunk(chunk)
yield chunk
except Exception as exc:
self.end(exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

except Exception as exc:
self.end(exc)
raise
self.end()
Copy link
Contributor

@anuraaga anuraaga Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, without the changes this doesn't get called in the new test cases? It's unexpected to me since they seem to consume the stream in a for loop so wondering if we know why it would be a problem.

A case that does intuitively seem to require this change is one that breaks during iteration

with response:
  for chunk in response:
    first = chunk
    break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if that is a fail case, we may need to consume the stream in the exit methods to get the stop / token info.

Copy link
Contributor Author

@SrdjanLL SrdjanLL Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for probing into this!

TL;DR While digging further I think I found the most likely root cause of missing spans in our chatbot-rag-app and it's not multiple attempts to end a span, but default ObjectProxy behaviour when proxied object gets used as context manager. The fix inspired by the initial patch and refactor still applies, but it sort of solved the problem by chance.

Sorry for the verbose root causing bellow, but had to leave this somewhere (led by Riccardo's example 😄)


All right, so I dug a bit further into it and found that on the unit test this does get called (when not guarded) and the unit tests capture regression in terms of the span count, while we also get the warning for trying to close an already ended span. While a good regression to catch, it's not the what I set out to do with the change so thanks for drilling into this.

Also if that is a fail case, we may need to consume the stream in the exit methods to get the stop / token info.

That's an interesting one and raises a question of whether we'd want to capture the tokens before the exit or the token info that that application chooses to deal with or the token information from the stream response. That seems like a weird edge case though? And may not be the right thing to resolve as part of this PR.


Back to the original issue. Thanks to your comment I got to the point of debugging where I found __iter__ doesn't get invoked unless __enter__ and __exit__ methods were are added toStreamWrapper. Counterintuitively. No obvious clues on our end why these two different concepts have dependency.

My assumption is that somewhere downstream (langchain/openai) proxied object is used with context manager and not implementing enter/exit overrides on the StreamWrapper proxy seem to use proxied object's behaviour and not resulting in the generator invocation on the proxy. The closest thing I was able to find related to this for wrapt.ObjectProxy can be found here, which concerns over proxying aiohttp.request that uses (async) context manager and seems to be used by openai/langchain as well. I found relevant code on the BaseChatOpenAI that seems the most likely cause in our use case. Implementing context manager enter/exit on our StreamWrapper doesn't affect the behavior of the instrumented code, but it solves the problem of generator invocation.
The I added check the span count and fail with 0 spans when context manager is not implemented which should catch regressions on this in the future.

Seems like the initial patch from @xrmx fixed the span closure warnings and also resolved this issue, but it's good to know that what I explained above was causing missing traces and no re-attempting the span closure.

Either way, I will proceed with your suggestion to add ended flag and guard span ending in a single place instead as that seem to cover both the warning of attempt to end an already ended span and this weird regression.

@SrdjanLL SrdjanLL requested a review from anuraaga April 22, 2025 20:42
@codefromthecrypt
Copy link
Contributor

@SrdjanLL want to verify this fix end-to-end?

here https://github.com/elastic/elasticsearch-labs/blob/main/example-apps/chatbot-rag-app/requirements.txt#L536

I think you can substitute with this

elastic-opentelemetry-instrumentation-openai @ git+https://github.com/SrdjanLL/elastic-otel-python-instrumentations.git@bugfix/span-closure#subdirectory=instrumentation/openai

then run chatbot-rag-app with python, or add git temporarily to its Dockerfile so it can resolve that requirement like this:

RUN apt-get update \
    && apt-get install -y --no-install-recommends git \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

@SrdjanLL
Copy link
Contributor Author

@codefromthecrypt - thanks for the suggestion!
I've already made myself a nice feedback loop by using an editable installation (pip install -e) for my local version ofinstrumentation/openai and adding it to the virtualenv I use to run chatbot-rag-app with Python. This way I was able to confirm the traces are complete and the Gen AI dashboard has data.

Do you think there's anything missing in that feedback loop?

I will follow your suggestion to make sure it works as well (and as part of setting up my feedback loops).

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code lgtm, thanks!

@codefromthecrypt
Copy link
Contributor

@SrdjanLL never mind me as your approach works fine. just sometimes I've had eyes crossed on my venvs and want something docker can prove is repeatable.

@SrdjanLL SrdjanLL merged commit bee16d4 into elastic:main Apr 23, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants