-
Notifications
You must be signed in to change notification settings - Fork 282
Retrieval Span Support #2924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Retrieval Span Support #2924
Conversation
lmolkova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retrieval is in many cases based on the database retrieval. e.g. postgreds or MongoDB instrumentation has no knowledge that it's used in the context of GenAI application.
So retrieval in a general case is just a DB call with semantics described in https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/README.md
There is a question of whether search engines are databases or should be covered by a separate or additional set of the conventions - #1869 - this is where OpenAI retrieval API should probably belong.
|
Adding more context from GenAI SIG call:
My take: Reading through lanchain, llamaindex, haystack docs, retriever is in most cases is an thin layer on top of a database or a search client which may be instrumented using database conventions and/or hypothetical search conventions. Retrievers could be more complicated and combine multiple source of data or perform additional logic, in these cases, the spans they emit might be significantly different than underlying DB calls and then multiple layers may be instrumented at the same time. In the case of thin layer abstraction / wrapper, having two spans does not improve observability, but increases costs and noise. When it comes to abstractions, there is a classic problem of instrumentation layers (ORM vs database spans, lanchain LLM vs underlying model-client spans, etc): both layers could be instrumented and there are pros and cons (DB layer has more low-level info, framework layer represents caller perspective better). The duplication is a common problem. Solutions may include:
I think path forward for this PR:
|
|
Also related #1231 |
|
Hi @lmolkova, Here are a few examples from trace loop and our instrumentation of proposed retrieval spans, also updated in the google doc linked to the issue I believe, we also discussed that we are fine moving retrieval spans from the genai to db spaces, but would the optionality to have or enable/disable genai attributes be possible as well, similar to enabling/disabling retrievals as a whole? So that retrievals could parent a db + embedding operation. |
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
|
This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:
Such changes may be rejected or put on hold until a new SIG/project is established. Please refer to the Semantic Convention Areas |
# Conflicts: # docs/registry/attributes/gen-ai.md
080e240 to
64ff6d9
Compare


Fixes #2907
Changes
Add retrieval span support to db spans
Important
Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.
Merge requirement checklist
[chore]