Skip to content

Conversation

@JWinermaSplunk
Copy link

@JWinermaSplunk JWinermaSplunk commented Oct 16, 2025

Fixes #2907

Changes

Add retrieval span support to db spans

Important

Pull requests acceptance are subject to the triage process as described in Issue and PR Triage Management.
PRs that do not follow the guidance above, may be automatically rejected and closed.

Merge requirement checklist

  • CONTRIBUTING.md guidelines followed.
  • Change log entry added, according to the guidelines in When to add a changelog entry.
    • If your PR does not need a change log, start the PR title with [chore]
  • Links to the prototypes or existing instrumentations (when adding or changing conventions)

Copy link
Member

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrieval is in many cases based on the database retrieval. e.g. postgreds or MongoDB instrumentation has no knowledge that it's used in the context of GenAI application.

So retrieval in a general case is just a DB call with semantics described in https://github.com/open-telemetry/semantic-conventions/blob/main/docs/database/README.md

There is a question of whether search engines are databases or should be covered by a separate or additional set of the conventions - #1869 - this is where OpenAI retrieval API should probably belong.

@github-project-automation github-project-automation bot moved this from Untriaged to Blocked in Semantic Conventions Triage Oct 19, 2025
@jsuereth jsuereth moved this from Blocked to Awaiting codeowners approval in Semantic Conventions Triage Oct 27, 2025
@lmolkova
Copy link
Member

Adding more context from GenAI SIG call:

  • langchain, llamaindex, haystack offer retriever abstraction.
  • the question is: should corresponding implementations be instrumented and if so, which conventions they should follow

My take:

Reading through lanchain, llamaindex, haystack docs, retriever is in most cases is an thin layer on top of a database or a search client which may be instrumented using database conventions and/or hypothetical search conventions.

Retrievers could be more complicated and combine multiple source of data or perform additional logic, in these cases, the spans they emit might be significantly different than underlying DB calls and then multiple layers may be instrumented at the same time. In the case of thin layer abstraction / wrapper, having two spans does not improve observability, but increases costs and noise.

When it comes to abstractions, there is a classic problem of instrumentation layers (ORM vs database spans, lanchain LLM vs underlying model-client spans, etc): both layers could be instrumented and there are pros and cons (DB layer has more low-level info, framework layer represents caller perspective better). The duplication is a common problem. Solutions may include:

I think path forward for this PR:

  • separate retrieval from GenAI domain. Most of the attributes defined here can be generic db or search attributes. Please follow the discussion in Should search engines follow database semantic conventions? #1869
  • lanchain / llamaindex / etc instrumentation would cover retrieval following that db or search conventions and would provide an option to disable their retrieval instrumentation (e.g. when user prefers to use underlying existing DB client instrumentation)

@lmolkova
Copy link
Member

Also related #1231

@JWinermaSplunk
Copy link
Author

Hi @lmolkova,

Here are a few examples from trace loop and our instrumentation of proposed retrieval spans, also updated in the google doc linked to the issue
https://docs.google.com/document/d/1DVE3Ht686nuxww-Z1JNyy2VInn9LeCJQYw8NffbjGtg/edit?tab=t.0.

I believe, we also discussed that we are fine moving retrieval spans from the genai to db spaces, but would the optionality to have or enable/disable genai attributes be possible as well, similar to enabling/disabling retrievals as a whole? So that retrievals could parent a db + embedding operation.
unnamed
unnamed-2

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link

This PR contains changes to area(s) that do not have an active SIG/project and will be auto-closed:

  • database

Such changes may be rejected or put on hold until a new SIG/project is established.

Please refer to the Semantic Convention Areas
document to see the current active SIGs and also to learn how to kick start a new one.

# Conflicts:
#	docs/registry/attributes/gen-ai.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Add retrieval /search span support to Semantic Conventions

3 participants