Skip to content

Variants for response for /audio/transcriptions missing discriminator field. #497

@glecaros

Description

@glecaros

The response for /audio/transcription defines a discriminated union with task being the discriminated property. Of the three variants, only CreateTranscriptionResponseDiarizedJson has this property.

The definitions:

/audio/transcriptions:
  post:
    operationId: createTranscription
    tags:
      - Audio
    summary: Create transcription
    requestBody:
      required: true
      content:
        multipart/form-data:
          schema:
            $ref: '#/components/schemas/CreateTranscriptionRequest'
    responses:
      '200':
        description: OK
        content:
          application/json:
            schema:
              anyOf:
                - $ref: '#/components/schemas/CreateTranscriptionResponseJson'
                - $ref: '#/components/schemas/CreateTranscriptionResponseDiarizedJson'
                  x-stainless-skip:
                    - go
                - $ref: '#/components/schemas/CreateTranscriptionResponseVerboseJson'
              discriminator:
                propertyName: task
          text/event-stream:
            schema:
              $ref: '#/components/schemas/CreateTranscriptionResponseStreamEvent'
CreateTranscriptionResponseJson:
  type: object
  description: Represents a transcription response returned by model, based on the provided input.
  properties:
    text:
      type: string
      description: The transcribed text.
    logprobs:
      type: array
      optional: true
      description: >
        The log probabilities of the tokens in the transcription. Only returned with the models
        `gpt-4o-transcribe` and `gpt-4o-mini-transcribe` if `logprobs` is added to the `include` array.
      items:
        type: object
        properties:
          token:
            type: string
            description: The token in the transcription.
          logprob:
            type: number
            description: The log probability of the token.
          bytes:
            type: array
            items:
              type: number
            description: The bytes of the token.
    usage:
      type: object
      description: Token usage statistics for the request.
      anyOf:
        - $ref: '#/components/schemas/TranscriptTextUsageTokens'
          title: Token Usage
        - $ref: '#/components/schemas/TranscriptTextUsageDuration'
          title: Duration Usage
      discriminator:
        propertyName: type
  required:
    - text
CreateTranscriptionResponseDiarizedJson:
  type: object
  description: >
    Represents a diarized transcription response returned by the model, including the combined transcript
    and speaker-segment annotations.
  properties:
    task:
      type: string
      description: The type of task that was run. Always `transcribe`.
      enum:
        - transcribe
      x-stainless-const: true
    duration:
      type: number
      description: Duration of the input audio in seconds.
    text:
      type: string
      description: The concatenated transcript text for the entire audio input.
    segments:
      type: array
      description: Segments of the transcript annotated with timestamps and speaker labels.
      items:
        $ref: '#/components/schemas/TranscriptionDiarizedSegment'
    usage:
      type: object
      description: Token or duration usage statistics for the request.
      discriminator:
        propertyName: type
      anyOf:
        - $ref: '#/components/schemas/TranscriptTextUsageTokens'
          title: Token Usage
        - $ref: '#/components/schemas/TranscriptTextUsageDuration'
          title: Duration Usage
CreateTranscriptionResponseVerboseJson:
  type: object
  description: Represents a verbose json transcription response returned by model, based on the provided input.
  properties:
    language:
      type: string
      description: The language of the input audio.
    duration:
      type: number
      description: The duration of the input audio.
    text:
      type: string
      description: The transcribed text.
    words:
      type: array
      description: Extracted words and their corresponding timestamps.
      items:
        $ref: '#/components/schemas/TranscriptionWord'
    segments:
      type: array
      description: Segments of the transcribed text and their corresponding details.
      items:
        $ref: '#/components/schemas/TranscriptionSegment'
    usage:
      $ref: '#/components/schemas/TranscriptTextUsageDuration'
  required:
    - language
    - duration
    - text

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions