Use WebVTT standard instead of a cooked-up JSON file #756

dascritch · 2026-03-04T21:17:44Z

dascritch
Mar 4, 2026

WebVTT is a standard well implemented in browsers for chapters, annotations and captions because it is in WhatWG and W3C recommended implemntations for the Media objects .

In https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/1.0.md#user-content-chapters , for the tag <podcast:chapters>, it states that browsers aren't supporting ID3 chapters tags. This is true. But instead the proposal use a less standard solution, a brand new json file.

Support for WebVTT files is nearly complete on web browsers and are a W3C standard.](https://www.w3.org/TR/webvtt1/), works smoothly in 99% market-share browsers, are exposed and used in accessibility tools. So why preconize a new file format without native implementation instead to use one perfectly used now for 10 years (in subtitling, but it works perfectly too for chaptering, i'm using it) ?

I suggest to change this to recommend WebVTT as a preferred solution, mime type text/vtt, documentation https://www.w3.org/TR/webvtt1/, and alternatively to suggest application/json+chapters.

cf #315

theDanielJLewis · 2026-03-04T21:58:07Z

theDanielJLewis
Mar 4, 2026

I think it's an interesting idea, especially since they also have a way to for metadata, but I recommend against this for three reasons:

The documentation itself calls chapters with metadata "non-normative."
Simple JSON is much easier for a developer to parse and support. Look at how many apps already support this—including Apple Podcasts, finally.
The "chapters file" is actually not a "chapters file." It's a file containing episode metadata, allowing for much more than only chapters, potentially being able to shift some just-in-time metadata to an external file instead of overloading the RSS feed. And when such features are adopted, it will be extremely easy for developers to support if they already load the chapters, because the other data will be right there in the JSON object they're already processing.

0 replies

samsethi · 2026-03-08T21:22:22Z

samsethi
Mar 8, 2026

Reading the transcript text - https://github.com/Podcastindex-org/podcast-namespace/blob/main/docs/examples/transcripts/transcripts.md

"Want to support only one format? WebVTT is used by Apple Podcasts for ingest, and also natively supported by web browsers. Because the WebVTT format is the most flexible, it's an ideal choice if you can only support one format."

The JSON representation is a flexible format that accomodates various degrees of fidelity in a concise way. At the most precise, it enables word-by-word highlighting. This format for podcast transcripts should adhere to the following specifications.

Apple uses the VTT format with accurate word highlighting.

1 reply

jamescridland Mar 9, 2026

Apple uses the VTT format with accurate word highlighting.

There's no accurate word highlighting in VTT format; and I know I'm not supplying word-by-word to Apple. Yet, they are producing word highlighting for all shows.

Here's how I think Apple works:

Apple does its own transcription on a podcast, which includes accurate word-by-word highlighting.
Where a VTT format file is provided by the publisher, Apple appears to a) ingest that VTT format file; b) compare it to the transcription Apple has done itself; c) if the text is above 90% similar, Apple accepts the publisher's VTT format file, but applies timing from its own transcription.

In both cases, Apple redacts dynamic advertising from its transcript. I don't know how it does that. It isn't on-device, since the transcription appears before the audio is played. It may be requesting different copies of the audio and comparing them.

I don't think this relates to the files being sent to Apple (indeed, I can confirm it doesn't, given Podnews Daily has no word timing information).

jamescridland · 2026-03-09T00:28:25Z

jamescridland
Mar 9, 2026

I think there are three proposals here from @dascritch - let's see if I can help unpack them:

1. "Use WebVTT standard"

I agree with this part of the proposal. I'd like to propose that we retire SRT/TXT/HTML format to simplify the specification.

WebVTT is a standard for browsers, which supports VTT files out of the box for video, and supports VTT files for audio quite simply as well. Here's use of an AUDIO player with VTT support.

SRT files are much less well supported by the web.

If you have to have an SRT file for your application, then it's an easy transform from a VTT file. They're almost identical in nature.

Similarly, a TXT or HTML transcription can be built from a VTT file as well.

Removing complexity from the podcast:transcript specification would enable this feature to be more effective, since publishers would clearly be told to produce ONE file format, and consumers would only have to deal with that one format. The specification at the moment is messy and complex and needn't be.

2. "Use WebVTT chapters"

WebVTT has chapter support. However, podcasting uses four chapter formats currently - "chapters in descriptions" (Apple, YouTube, Spotify); "chapters in podlove format" (Spotify?); "JSON chapters" (Apple); ID3 tags (Apple).

I would be keen to avoid adding a fifth chapter format without a clear understanding of the benefits. I don't believe that chapters are in-use by browser implementations.

3. "Improve the "cooked-up" JSON file"

I do see the benefit of offering a word-by-word format (which VTT isn't). Is there any prior art in word-by-word format? Should we be aligning with a standard?

Next steps

Do we split out the three parts of this proposal?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use WebVTT standard instead of a cooked-up JSON file #756

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Use WebVTT standard instead of a cooked-up JSON file #756

Uh oh!

dascritch Mar 4, 2026

Replies: 3 comments · 1 reply

Uh oh!

theDanielJLewis Mar 4, 2026

Uh oh!

samsethi Mar 8, 2026

Uh oh!

jamescridland Mar 9, 2026

Uh oh!

jamescridland Mar 9, 2026

Next steps

dascritch
Mar 4, 2026

Replies: 3 comments 1 reply

theDanielJLewis
Mar 4, 2026

samsethi
Mar 8, 2026

jamescridland
Mar 9, 2026