Replies: 1 comment
-
A few thoughts:
For podcasting, this is always advised. You absolutely shouldn't be using variable bitrate, since it causes many issues with many different podcast apps - not least because of the issues of accurate seeking. Some podcast hosts will reject VBR audio, or will produce a CBR version of it for these reasons. IMO, nobody cares about audio quality above a "sounds good enough". 128kbps MP3 (CBR) seems fine for everyone, and has no issues with seeking.
MP3 is not patent-encumbered (the patents ran out in 2017). AAC audio, however, is still patent-encumbered - requiring a per-player payment to be made to Fraunhofer who own the patents. Those payments are normally made (by the OS), but that's not a given. I'd really rather point people towards Opus audio, which is not patent-encumbered and sounds much better than AAC at lower bitrates. However, Opus doesn't play on iOS, so... Some numbers, if you're interested, from a Feb Podcast Index dump (based on file extensions)...
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
With the introduction of timestamped transcripts and chapters, we will see an increasing number of apps allowing listeners to jump precisely to chapter marks or transcript positions.
However, despite the precision of the timestamp metadata itself, in practice jumping to a precise audio position is limited by the characteristics of the MP3 audio format. In order to support this jumping, or "seeking", it needs to be possible to predict how many bytes into the file a certain timestamp will appear at. For this, the MP3 needs to be encoded either with a constant bit rate (which is normally not advised), or alternatively if encoded with a variable bit rate, needs to embed a seek table. There are various ways encoders can put a seek table into the MP3, but typically the seek table is limited in size, and allows for only 100 entries. For short audio files, these seek points are packed closely enough together that its approximate nature is less noticeable. But for long-form audio (which podcasts very often are), having a 100 minute podcast episode with only 100 seek points means that you can only accurately seek to the nearest minute. Heuristics improve this a bit, but seeking can still be off by a significant margin (e.g. 20 seconds), and there can also be significant differences in the accuracy achieved by players on different platforms, or SDKs, or different browsers in the case of the web. This means that seeking to a chapter marker could be significantly off, and jumping to a transcript timestamp based on a search result can be significantly off. What apps have to do if they require accurate seeking is to ignore the seek table embedded within the MP3 file and build their own seek table by (slowly) decoding the entire file which is not really a good use of the battery or the user's time. It's also a technique that does not lend itself well to PWAs, and is something only native apps could more practically apply.
In general, MP3 is not considered an appropriate audio format for apps that require precise seeking on long-form audio.
The main competitor to MP3 is M4A. It is essentially AAC encoded audio wrapped in an MP4 container, and this allows for storing a complete seek table with perfect precision. Thus, seeking within an M4A is essentially perfect.
@jamescridland has written a pre-podcast-index article from 2020 comparing MP3 and M4A, although it mainly focuses on audio quality, file size, and requirements of certain platforms, with the advice to publish both formats if possible, or just MP3 if only one must be chosen. Although given the direction in which the spec and application use cases are evolving, I think seek precision is another important factor to consider.
It would be nice if the spec could be written to perhaps gently push podcasters and hosts in the direction of M4A, since that's the future I think we would want to create for the types of apps we want to see.
Beta Was this translation helpful? Give feedback.
All reactions