Add gen_ai `_ref` attributes for referencing external uploaded content #2750

aabmass · 2025-09-08T15:00:30Z

Fixes #2753

Changes

Introduces reference attributes that were held back from #2179 (092db44):

gen_ai.system_instructions_ref
gen_ai.input.messages_ref
gen_ai.output.messages_ref

This makes uploading references to external storage more normative

Prototypes

Prototype in opentelemetry-instrumentation-google-genai.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
Links to the prototypes or existing instrumentations (when adding or changing conventions)

Adapted from 092db44 Co-authored-by: Liudmila Molkova <[email protected]>

lmolkova

I do support this version, just wanted to bring up the reasoning we pushed it away from #2179.

I believe we got stuck on the following question: should we record content on parts in external storage or record the whole json there.

My preference is on recording the whole thing with the following arguments:

it's easier and faster (when it comes to P95, more reliable) to upload and download one reasonably sized (megabytes) object than N of smaller ones. Creating an object/blob is time-consuming operation on its own
input, output, instructions object are not useful without the content and part's content output is not useful without roles/structure. So when string/binary content is uploaded, but the rest of the message content is not, all consumers always need to do smart joins on the telemetry and object store data.
someone who wants to upload individual contents per part can still do so

Leaving it as a comment and not approval only to hear feedback.

alexmojaki · 2025-09-09T14:30:19Z

someone who wants to upload individual contents per part can still do so

But this will be less useful to do if there isn't a semconv for uploads per part. So this argument doesn't work for me unless we think we might define conventions for both versions.

The ideal I’m imagining for a ref for each part looks like this:

For text parts, the value is still stored inline on the JSON up to a character limit. If it’s big enough, a truncated value is stored as well as a URL to the full value. This allows normal querying to some extent, but not always reliably.
Binary data like images are always uploaded as a ref and never inlined. They’re not really queryable anyway and are a pain to keep in attributes. This means that if your messages are generally a mix of binary data and short text messages that wouldn’t be truncated, you can always reliably query the text while still getting a lot of benefit from external storage.
- The data can be stored as raw bytes in the original format, e.g. a PNG, rather than base64 encoded within JSON.
The ref URL is based on a hash of the content, meaning that the same content always has the same URL. This way a single part is never uploaded twice. This is great for GenAI where a conversation with many messages leads to the early messages being repeated many times.
- The client (e.g. OTel SDK) can have an LRU cache of these hashes/URLs so that if a part was uploaded recently it doesn’t even need an HTTP call to know that it doesn’t need to upload it again.
- Browser caching can maybe save downloading parts sometimes.
- System instructions and tool definitions are typically repeated verbatim many times in a process lifetime, so this hashing strategy would be very helpful for them. This isn't necessarily an argument in favour of a ref per part since these are whole attributes, but it does mean that there's already motivation to implement uploading to external storage in this way. But a ref per part does help a lot here if system messages are kept at the start of gen_ai.input.messages instead of gen_ai.system_instructions, which is sometimes the case.
Each part can be uploaded/downloaded in parallel, which might sometimes be faster and increase the chance that at least some parts are uploaded successfully.

lmolkova · 2025-09-30T16:07:20Z

Converting to draft - would like to have some end-to-end impl to proceed

aabmass changed the title ~~Genai refs~~ Add gen_ai _ref blob reference attributes Sep 8, 2025

aabmass force-pushed the genai-refs branch from 36ba029 to e8b71e4 Compare September 9, 2025 02:45

aabmass and others added 2 commits September 9, 2025 03:18

Add gen_ai _ref attributes for referencing external uploaded content

388794c

Adapted from 092db44 Co-authored-by: Liudmila Molkova <[email protected]>

generated files

e062401

aabmass force-pushed the genai-refs branch from e8b71e4 to e062401 Compare September 9, 2025 03:19

github-actions bot added the enhancement New feature or request label Sep 9, 2025

aabmass changed the title ~~Add gen_ai _ref blob reference attributes~~ Add gen_ai _ref attributes for referencing external uploaded content Sep 9, 2025

aabmass added this to GenAI Semantic Conventions and Instrumentation libraries Sep 9, 2025

aabmass marked this pull request as ready for review September 9, 2025 03:31

aabmass requested review from a team as code owners September 9, 2025 03:31

github-project-automation bot added this to Semantic Conventions Triage Sep 9, 2025

github-project-automation bot moved this to Untriaged in Semantic Conventions Triage Sep 9, 2025

lmolkova reviewed Sep 9, 2025

View reviewed changes

aabmass mentioned this pull request Sep 12, 2025

Add genai upload hook to genai utils following semconv v1.37 open-telemetry/opentelemetry-python-contrib#3753

Closed

4 tasks

lmolkova added the area:gen-ai label Sep 23, 2025

lmolkova marked this pull request as draft September 30, 2025 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add gen_ai `_ref` attributes for referencing external uploaded content #2750

Add gen_ai `_ref` attributes for referencing external uploaded content #2750

Uh oh!

aabmass commented Sep 8, 2025 •

edited

Loading

Uh oh!

lmolkova left a comment •

edited

Loading

Uh oh!

alexmojaki commented Sep 9, 2025 •

edited

Loading

Uh oh!

lmolkova commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add gen_ai _ref attributes for referencing external uploaded content #2750

Are you sure you want to change the base?

Add gen_ai _ref attributes for referencing external uploaded content #2750

Uh oh!

Conversation

aabmass commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Prototypes

Merge requirement checklist

Uh oh!

lmolkova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexmojaki commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmolkova commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add gen_ai `_ref` attributes for referencing external uploaded content #2750

Add gen_ai `_ref` attributes for referencing external uploaded content #2750

aabmass commented Sep 8, 2025 •

edited

Loading

lmolkova left a comment •

edited

Loading

alexmojaki commented Sep 9, 2025 •

edited

Loading