Skip to content

For ONVIF TTS audio proposal, to support device with TTS function#694

Open
Peggy0422 wants to merge 19 commits intodevelopmentfrom
video/TTS-audio-clip
Open

For ONVIF TTS audio proposal, to support device with TTS function#694
Peggy0422 wants to merge 19 commits intodevelopmentfrom
video/TTS-audio-clip

Conversation

@Peggy0422
Copy link
Copy Markdown

To support audio product with TTS function, several operation should be done:

Added TTSCapabilities(Optional): indicate whether the device is capable of TTS function and its corresponding TTS configuration. So add complex type "TTSCapabilities" to the existing complex type "AudioClipCapabilities".
Parameter:

  1. MaxContentLength: indicates the max length of content of a text for device to convert to an audio clip;
  2. TTSLanguage: indicates what language(s) the device supports for TTS function.
  3. TTSVoiceType: indicates voice types that device supports for TTS function.
  1. Add “AddTTSAudioClip”and "AddTTSAudioClipResponse": to send a text, TTS configuration and audio clip configuration to device, device could convert the text to an audio clip based on TTS Configuration. Subsequently, the device will play this audio clip based on configuration.
    Parameter:
  1. Token(Optional): token for the audio clip.
  2. Configuration: audio clip configuration to add, see element "Configuration" .
  3. TTSConfiguration: for TTS audio clip, it specifies the audio content, language and voice type when device play this audio clip.
    Reponse:
  4. Token: unique token of the TTS audio clip to be uploaded.

media2.wsdl

  1. Updated complexType "AudioClipCapabilities" with element "TTSCapabilities"; added complexType "TTSCapabilities" with attributes "MaxContentLength", "TTSLanguage" and "TTSVoiceType"; added simpleType "TTSLanguage" and "TTSVoiceType".
  2. Added elements "AddTTSAudioClip" and "AddTTSAudioClipResponse" for sending a text, TTS configuration and audio clip configuration to the device.
  3. Added complexType "TTSAudio" for element "TTSConfiguration". It includes parameters such as Content, Language, VoiceType.
  4. Added "AddTTSAudioClipRequest" and "AddTTSAudioClipResponse"

media2.xml and documentation

  1. Added detail descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.
  2. Updated audio clip capabilities with TTSCapabilities.

1. Added AddTTSAudioClip request and AddTTSAudioClip response for sending a text and its TTS configuration to the device(1621-1652)(2036-2041)(2418-2422)(2935-2943).
2. Added complex types "TTS Audio" (1465-1485)for TTSConfiguration to support TTS function. It includes parameters Content, Language, VoiceType.
3. updated AudioClipCapabilities with TTSCapabilities(177-181), and added complex types for TTSCapabilities(201-220)to indicate the device supports TTS function and its corresponding configuration. 
complex types TTSCapabilities includes MaxContentLength, TTSLanguage and TTSVoiceType.
4. Added simpleType TTSLanguage(220-231) and TTSVoiceType(232-238).
1. Added detailed descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.(2359-2416)
2. updated audio clip Capabilities with TTSCapabilities.(2698-2700)
update code line information for TTS function
correct some editorial errors
Updated the description of the AddTTSAudioClip operation to clarify the parameters and response. Updated the description of TTScapabilities.
TTS audio clip pull request was firstly created as number 668
Updated TTS configuration description and added TTSCapabilities entry.
@sujithhanwha
Copy link
Copy Markdown
Contributor

OLD PR for reference
#668

@ocampana-videotec ocampana-videotec added this to the 26.06 milestone Dec 4, 2025
doc/Media2.xml Outdated
</varlistentry>
</variablelist>
<para></para>
<para><emphasis role="bold">Note:</emphasis> Audio clip uploads to the device can fail in the following scenarios, and a specific HTTP error code should be returned to the client when an upload fails.</para>
Copy link
Copy Markdown
Contributor

@venki5685 venki5685 Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this note seems not applicable for TTSAudioClip

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is not for TTS, I will delete it.

delete inappropriate note for OPTION AddTTSAudioClip
Copy link
Copy Markdown
Contributor

@johado johado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small textual comments.

doc/Media2.xml Outdated
<title>AddTTSAudioClip</title>
<para>This operation adds a text, audio clip configuration and TTS configuration to the device, for device converting the text to an audio clip based on the TTS configuration.
The response to the command includes a unique token for this converted audio clip.
If the device is unable to support language specified in the TTS configuration, the associated configuration will deleted from the device.</para>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "be" to "will be deleted"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it.

doc/Media2.xml Outdated
<term>response</term>
<listitem>
<para role="param">Token - [tt:ReferenceToken]</para>
<para role="text">Unique token of the TTS audio clip to be uploaded.</para>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change "to be uploaded" to "that was added" ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your advise, we consider using the word "assign", which should be more precise.

doc/Media2.xml Outdated
</varlistentry>
<varlistentry>
<term>TTSCapabilities</term>
<listitem><para>Indicates device supports TTS function and TTS configuration.See tr2: TTSCapabilities.</para></listitem>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space after .: "..configuration. See tr2:..."

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you.

</xs:element>
<xs:element name="Language" type="xs:string">
<xs:annotation>
<xs:documentation>Language for the TTS audio clip playback. See tr2: TTSLanguage. </xs:documentation>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to "See tr2:TTSLanguage and TTSCapabilities." ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your option. TTSLanguage is an attribute within TTSCapability already. If we want to point out that the language for TTS audio clip playback must be one of the languages that supported by the device, we could consider revise the explanation to clearly indicate this, such as: "The language which is supported and used for TTS audio clip playback. "

</xs:element>
<xs:element name="VoiceType" type="xs:string">
<xs:annotation>
<xs:documentation>The voice type for the TTS audio clip playback. See tr2: TTSVoiceType.</xs:documentation>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to "See tr2:TTSVoiceType and TTSCapabilities." ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to update the explanation for TTSVoiceType, just like commit for TTSLanguage

<xs:sequence>
<xs:element name="Token" type="tt:ReferenceToken">
<xs:annotation>
<xs:documentation>Unique token of the TTS audio clip to be uploaded.</xs:documentation>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change "to be uploaded" to something more relevant. converted, generated, ..?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for bring it up, yes, we consider changing it and using the word "assign", which should be more precise.

<xs:anyAttribute processContents="lax"/>
</xs:complexType>
<!--===============TTS Language================-->
<xs:simpleType name="TTSLanguage">
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is reasoning behind decision of languages in below list?

Copy link
Copy Markdown

@robberos robberos Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any standard for offical language names that can be refered to?

TTSCapabilities and TTSAudio uses open strings, so enum should provide a good pattern.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your comments! We truly appreciate your input and have been carefully considering how to best define these general concepts. Your mention of ISO international standards was particularly helpful and guided our further research. We also looked into RFC 5646 for language representation across countries. So we would like to use alpha-2 codes to represent languages and countries, as recommended in ISO 639-1 and ISO 3166-1. For languages with regional variations, we plan to adopt the language-country format (e.g., en-US, zh-CN). Thank you again for your feedback.

doc/Media2.xml Outdated
</itemizedlist>
</section>
</section>
<section xml:id="section_wvd_dzg_rye">
Copy link
Copy Markdown

@robberos robberos Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id should be unique in xml, right? seems as it is a copy of SetAudioClip section below

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you for the suggestion. I have revised it accordingly.

See <a href="https://www.iso.org/obp/ui/">ISO Country Codes</a>.
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to make an explicit restriction here and not just defined it as a string? If we go this way, whenever we need to add a language we need to update the WSDL file.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your comment! Yes, this is an important issue we should considered.
Previously, we defined languages using string format and listed commonly used or potentially needed languages. However, this approach does introduce a maintenance burden—as you pointed out, each new language addition would require updating the WSDL file.To address this, we now directly reference ISO-standard language codes via strings. Users may refer to the official ISO codes for specific needs, while the WSDL only defines the reference rules. The examples in TTSLanguage are provided for convenience. I hope this clarifies the approach. Thank you again for your comment!

Added note about enumeration values being illustrative in TTSLanguage.
Revise the description of language definition in TTScapability and TTSAudio
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants