Skip to content

Conversation

DylanRussell
Copy link
Contributor

Description

Add data classes which correspond to the types in the GenAI json schemas listed in https://github.com/open-telemetry/semantic-conventions/tree/main/docs/gen-ai

There's probably a lot of ways to do this.. Open to suggestions on how to keep these in-sync with the json schemas or to other approaches.

Type of change

Please delete options that are not relevant.

  • [ x] New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Just adding types..

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • [x ] No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • [ x] Followed the style guidelines of this project
  • Changelogs have been updated
  • [ x] Unit tests have been added
  • [ x] Documentation has been updated

Copy link
Member

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge thanks, it'd be very useful in all GenAI instrumentations!

There could be a few other things we can put here - comments, validation, serialization helpers, but all of it can be added incrementally.

@aabmass
Copy link
Member

aabmass commented Sep 2, 2025

This LGTM but is there an issue or draft PR to show what is the plan for these dataclasses?

@DylanRussell
Copy link
Contributor Author

Check out #3709 for an example - basically the intent is just to use them as an intermediate store of the data before the export of events/spans

@aabmass
Copy link
Member

aabmass commented Sep 2, 2025

Got it. I was thinking this util would provide an API to actually emit the telemetry since there is a lot of shared boilerplate. Something like

class GenAiInstrumentation:
  def __init__(self, meter, tracer, logger): ...
  
  def handle_completion(
    self,
    input: InputMessage,
    outputs: OutputMessages,
    system: SystemInstruction,
  ) -> None:
    """Emits telemetry in the format chosen by the user
    
    depending on `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` and `OTEL_SEMCONV_STABILITY_OPT_IN` uploader configuration (TBD)
    """

### foollm_instrumentation.py
response = call_llm(input)
# convert to OTel format
input_messages, output_messages, system_instruction = massage_data(input, response)
self._instrumentation_util.handle_completion(input_messages, output_messages, system_instruction)

Can definitely handle it at a later point but I'm guessing #3709 pretty much already does that?

@DylanRussell
Copy link
Contributor Author

Got it. I was thinking this util would provide an API to actually emit the telemetry since there is a lot of shared boilerplate. Something like

class GenAiInstrumentation:
  def __init__(self, meter, tracer, logger): ...
  
  def handle_completion(
    self,
    input: InputMessage,
    outputs: OutputMessages,
    system: SystemInstruction,
  ) -> None:
    """Emits telemetry in the format chosen by the user
    
    depending on `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` and `OTEL_SEMCONV_STABILITY_OPT_IN` uploader configuration (TBD)
    """

### foollm_instrumentation.py
response = call_llm(input)
# convert to OTel format
input_messages, output_messages, system_instruction = massage_data(input, response)
self._instrumentation_util.handle_completion(input_messages, output_messages, system_instruction)

Can definitely handle it at a later point but I'm guessing #3709 pretty much already does that?

I did not write the PR that way, I branched on the env var flag much earlier in the code. I think massaging the data into the InputMessages/OutputMessages/SystemInstruction data types doesn't make sense for the old existing instrumentation which doesn't require the data to look that way

@aabmass
Copy link
Member

aabmass commented Sep 4, 2025

I did not write the PR that way, I branched on the env var flag much earlier in the code. I think massaging the data into the InputMessages/OutputMessages/SystemInstruction data types doesn't make sense for the old existing instrumentation which doesn't require the data to look that way

Discussed offline. This was regarding when the instrumentation is not opted into the new conventions. When they choose new conventions, we can have some shared util to avoid boilerplate.

This LGTM to merge.

@aabmass aabmass added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Sep 4, 2025
@aabmass aabmass enabled auto-merge (squash) September 4, 2025 20:39
@aabmass aabmass merged commit b1b9505 into open-telemetry:main Sep 4, 2025
631 of 633 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Skip Changelog PRs that do not require a CHANGELOG.md entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants