Skip to content
Wolle edited this page Feb 22, 2026 · 59 revisions

audioI2S

This library decodes the most popular audio formats such as wav, mp3, aac, m4a, opus, flac and vorbis. The ESP32, ESP32-S3 and ESP32-P4 can be used. PSRAM (minimum 2MB) is required. An external DAC is also required. DACs tested are PCM5102A, CS4344, PT8211 and MAX98357A. Other working I2C controlled DACS are AC101, ES8311, ES8388 and ES9083. The I2C Dacs require an additional library, which can be found in the "examples" folder. The audio source can be WiFi, SD, SD_MMC, SPIFFS or FFat. Ethernet is possible with an adapter (IP101, W5500, W32-ETH01), templates are also in the "examples" folder. Playlists like pls, m3u, asx, m3u8 are recognized and played.

The Arduino IDE can be used, the partition scheme Huge APP is required VS Code with the pioarduino extension is more suitable. This means that Arduino can be compiled as a component and settings in the menuconfig are possible. Recommended settings are expanding the RX buffer in lwip/TCP to increase the WiFi data throughput and lwip/set in PSRAM to increase the free heap.

Example in which the audio object is created during runtime with I2S port 1
//        #-----------+      +--------------+
//        |  DAC      |      |     ESP32    |
//        |       DIN +------+ DOUT         |
//        |      BCLK +------+ BCLK         |
//        |       LRC +------+ LRC          |
//        |       GND +------+ GND          |
//        |       Vcc +------+ 5V           |
//        |       SCK +------+ GND          |
//        +-----------+      +--------------+

#include "Arduino.h"
#include "WiFiMulti.h"
#include "Audio.h"

#define I2S_DOUT      4
#define I2S_BCLK      5
#define I2S_LRC       6

String ssid =     "**********";
String password = "**********";

Audio*  audio;
WiFiMulti wifiMulti;

void my_audio_info(Audio::msg_t m) {
    Serial.printf("%s: %s\n", m.s, m.msg);
}

void setup() {
    audio = new Audio(1);
    Serial.begin(115200);
    Audio::audio_info_callback = my_audio_info;
    Serial.print("\n\n");
    wifiMulti.addAP(ssid.c_str(), password.c_str());
    wifiMulti.run(); // if there are multiple access points, use the strongest one
    while (WiFi.status() != WL_CONNECTED) delay(1500);
    audio->setPinout(I2S_BCLK, I2S_LRC, I2S_DOUT);
    audio->setVolume(20); // default 0...21
    audio->connecttohost("http://bcast.vigormultimedia.com:8888/sjcomplflac");
}

void loop() {
    audio->loop();
    vTaskDelay(1);
}
Example with static audio object and SD_MMC
//        #-----------+      +--------------+
//        |  SD_MMC   |      |     ESP32    |
//        |        D0 +------+ D0           |
//        |       CLK +------+ CLK          |
//        |       CMD +------+ CMD          |
//        |       GND +------+ GND          |
//        |       Vcc +------+ 3.3V         |
//        +-----------+      +--------------+

#include "Arduino.h"
#include "Audio.h"

#define I2S_DOUT      9
#define I2S_BCLK      3
#define I2S_LRC       1
#define SD_MMC_D0    11
#define SD_MMC_CLK   13
#define SD_MMC_CMD   14

Audio audio;

void my_audio_info(Audio::msg_t m) {
    Serial.printf("%s: %s\n", m.s, m.msg);
}

void setup() {
    Serial.begin(115200);
    Audio::audio_info_callback = my_audio_info;
    pinMode(SD_MMC_D0, INPUT_PULLUP);
    SD_MMC.setPins(SD_MMC_CLK, SD_MMC_CMD, SD_MMC_D0);
    SD_MMC.begin("/sdcard", true);
    audio.setPinout(I2S_BCLK, I2S_LRC, I2S_DOUT);
    audio.setVolume(20); // default 0...21
    audio.connecttoFS(SD_MMC, "1.opus");
}

void loop() {
    audio.loop();
    vTaskDelay(1);
}

public methods

pauseResume

bool pauseResume();

stops at the current position or continues playback


connecttohost

bool connecttohost(const char* host, const char* user, const char* pwd);

host contains the target URL optional: user and pwd for access authorization


connecttoFS

bool connecttoFS(fs::FS& fs, const char* path, int32_t fileStartTime);

fs is the audio source, e.g. SD, SD_MMC, SPIFFS... fileStartTime is the start time in seconds (only works if the audio file contains the bitrate, which is the case for all newer files)


connecttospeech

bool connecttospeech(const char* speech, const char* lang);

speech is the text to be read out lang is the language, like "en" or "de" GoogleTTS is used for speech output


openai_speech

bool openai_speech(const String& api_key, const String& model, const String& input, const String& instructions, const String& voice, const String& response_format, const String& speed);

can only be used if there is an api_key purchased through a subscription https://developers.openai.com/api/docs/guides/text-to-speech


setConnectionTimeout

void setConnectionTimeout(uint16_t timeout_ms, uint16_t timeout_ms_ssl);

is the time to wait for the host's response. timeout_ms for unencrypted connections timeout_ms_ssl for encrypted connections


setAudioPlayTime

bool setAudioPlayTime(uint16_t sec);

only for audio files, jumps to the time position given in sec


setTimeOffset

bool setTimeOffset(int sec);

Jumps forward or backward from the current position by the amount specified in sec, only for audio files


setPinout

bool setPinout(uint8_t BCLK, uint8_t LRC, uint8_t DOUT, int8_t MCLK = I2S_GPIO_UNUSED);

specifies the GPIOs to which the DAC is connected. DOUT of the ESP32 is DIN of the DAC. Depending on the DAC used, 3 wires are required, often BCLK, LRC, DOUT (PCM5202A, PT8221) or MCLK, LRS, DOUT (CS4344)


isRunning

bool isRunning();

true if audio is playing, otherwise false


loop

void loop();

Must be placed in the Arduino loop and supplies the audio object with computing time. There must be at least one vTaskDelay(1) between two loops so that other tasks can be started!


stopSong

uint32_t stopSong();

Stops playback and returns the current audio position in seconds.


forceMono

void forceMono(bool m);

May be useful if there is only one channel. The average of the sum signal of both channels is formed.


setOutput48KHz

void setOutput48KHz(bool f48);

Some devices connected to the I2S output expect a sample rate of 48 kHz. For all other devices, there is no advantage in setting this.


setBalance

void setBalance(float balance);

Valid values are -16.0 ... +16.0 The following applies -16...0 -> gain left channel -16dB...0 and 0...+16 -> gain right channel 0...-16dB


setVolumeSteps

void setVolumeSteps(uint8_t steps);

The default value is 21. Some libraries for DACs with I2C control require other values, for example 64 or 100. Valid values can be between 21 and 255.


getVolumeSteps

uint8_t getVolumeSteps();

Is simply the return value of the set steps.


setVolume

void setVolume(uint8_t vol);

The value can be between 0 and the set VolumeSteps.


getVolume

uint8_t getVolume();

Returns the value set in setVolume();


setMute

void setMute(bool mute);

Mutes the output, does not change the volume value


getMute

bool getMute();

Mute return value


getI2sPort

uint8_t getI2sPort();

The ESP32 has two I2S ports. By default, port 0 is used, but can be set when creating the audio object Audio audio(1); This may be necessary if port 0 is occupied by another device (e.g. microphone)


getFileSize

uint32_t getFileSize();

Returns the size of the audio file regardless of the source.


getSampleRate

uint32_t getSampleRate();

The return value is the sample rate determined by the decoder, common values ​​are 22.05KHz, 44.1KHz and 48KHz


getBitsPerSample

uint8_t getBitsPerSample();

is for mp3, vorbis, aac, opus, always 16, for flac 16 or 24 for wav 8, 16, 24 or 32


getChannels

uint8_t getChannels();

can be 1 or 2.


getBitRate

uint32_t getBitRate();

Reads the sample rate from the audio data and returns it. If this is not possible, the sample rate is calculated from the current file. This can take a few seconds.


getAudioFileDuration

uint32_t getAudioFileDuration();

This is the total playing time of the audio file - it is calculated from its length and the bit rate.


getAudioFilePosition

uint32_t getAudioFilePosition();

This is the current position of the read pointer on the file, regardless of whether the audio data comes from a local storage medium or from the network. __ setAudioFilePosition

bool setAudioFilePosition(uint32_t pos);

Moves the read pointer to the specified position. This is only possible if the file is already playing. Only returns true if pos is within the audio block


getVUlevel

uint16_t getVUlevel();

Can be used to control a VU meter. The values ​​per channel are between 0...255.

uint16_t vu = getVUlevel();
uint8_t left = vu >> 8;
uint8_t right = vu & 0x00FF;

inBufferFilled

uint32_t         inBufferFilled();

returns the number of stored bytes in the inputbuffer


inBufferFree

uint32_t inBufferFree();

returns the number of free bytes in the inputbuffer


getInBufferSize

uint32_t getInBufferSize();

returns the size of the inputbuffer in bytes


inBufferStatus

void inBufferStatus();

Writes the current status of the buffer to the serial terminal.

filled 77322, free 578028
writeSpace 65535, readSpace 65535
writePtr 473976, readPtr 396654
isEmpty 0, isFull 0

setTone

void setTone(float gainLowPass, float gainBandPass, float gainHighPass);

Equalizer: the values ​​can be between -12dB and +12dB. The working frequencies are: LP 500Hz BP 1800Hz HP 6000Hz


setI2SCommFMT_LSB

void setI2SCommFMT_LSB(bool commFMT);

false: I2S communication format is by default I2S_COMM_FORMAT_I2S_MSB, right->left (AC101, PCM5102A) true: changed to I2S_COMM_FORMAT_I2S_LSB for some DACs (PT8211), Japanese or called LSBJ (Least Significant Bit Justified) format


getCodec

int getCodec():

Return values: 0 NONE, 1 WAV, 2 MP3, 3 AAC, 4 M4A, 5 FLAC, 7 OPUS, 9 VORBIS


getCodecname

const char* getCodecname();

Example:

printf("%s" getCodecname());

getVersion

const char* getVersion();

Returns the current version of the audioI2S library.


events

void my_audio_info(Audio::msg_t m) {
    switch(m.e){
        case Audio::evt_info:           Serial.printf("info: ....... %s\n", m.msg); break;
        case Audio::evt_eof:            Serial.printf("end of file:  %s\n", m.msg); break;
        case Audio::evt_bitrate:        Serial.printf("bitrate: .... %s\n", m.msg); break; // icy-bitrate or bitrate from metadata
        case Audio::evt_icyurl:         Serial.printf("icy URL: .... %s\n", m.msg); break;
        case Audio::evt_id3data:        Serial.printf("ID3 data: ... %s\n", m.msg); break; // id3-data or metadata
        case Audio::evt_lasthost:       Serial.printf("last URL: ... %s\n", m.msg); break;
        case Audio::evt_name:           Serial.printf("station name: %s\n", m.msg); break; // station name or icy-name
        case Audio::evt_streamtitle:    Serial.printf("stream title: %s\n", m.msg); break;
        case Audio::evt_icylogo:        Serial.printf("icy logo: ... %s\n", m.msg); break;
        case Audio::evt_icydescription: Serial.printf("icy descr: .. %s\n", m.msg); break;
        case Audio::evt_image: for(int i = 0; i < m.vec.size(); i += 2){
                                        Serial.printf("cover image:  segment %02i, pos %07lu, len %05lu\n", i / 2, m.vec[i], m.vec[i + 1]);} break; // APIC
        case Audio::evt_lyrics:         Serial.printf("sync lyrics:  %s\n", m.msg); break;
        case Audio::evt_log   :         Serial.printf("audio_logs:   %s\n", m.msg); break;
        default:                        Serial.printf("message:..... %s\n", m.msg); break;
    }
}

The detailed output could look like this:

info: ....... Reading file: "/1.opus"
audio_logs:   Audio.cpp:6153 rangeStart: 6153, audioFileSize: 2365542, len: 2431078, 65535
info: ....... OPUSDecoder has been initialized
info: ....... stream ready
info: ....... syncword found at pos 0
ID3 data: ... Title: Non, je ne regrette rien
ID3 data: ... Album: Édith Piaf
ID3 data: ... Album: Platinum Collection
ID3 data: ... Date: 2007-08-17
ID3 data: ... Artist: Édith Piaf
ID3 data: ... Track number/Position in set: 7
ID3 data: ... Genre: French
ID3 data: ... Comments: Romantic (Songs-DB_Occasion_album)
ID3 data: ... Comments: Female Vocalist (Songs-DB_Custom2_album)
ID3 data: ... Comments: Paris (Songs-DB_Custom3_album)
stream title: Non, je ne regrette rien - Édith Piaf - Piaf, Édith - Édith Piaf
cover image:  segment 00, pos 0005232, len 03061
cover image:  segment 01, pos 0008336, len 04080
cover image:  segment 02, pos 0012459, len 04080
cover image:  segment 03, pos 0016582, len 04080
cover image:  segment 04, pos 0020705, len 04080
cover image:  segment 05, pos 0024828, len 04080
cover image:  segment 06, pos 0028951, len 02403
info: ....... Opus Mode: SILK_ONLY
info: ....... AudioDataStart: 33031
bitrate: .... 136059
info: ....... Duration (s): 141
info: ....... Bitrate (b/s): 136059
info: ....... Channels: 2
info: ....... SampleRate (Hz): 48000
info: ....... BitsPerSample: 16
info: ....... Opus Mode: CELT_ONLY
info: ....... Closing audio file "1.opus"
info: ....... OPUSDecoder has been destroyed
end of file:  1.opus

ID3 data: are comments that are read from the metadata of the file or stream. Artist and Title are commonly used. cover image: are embedded images. These images can be present in one piece or in fragments. Fragments occur when the images are larger than an OGG block. An audio file can contain several images. evt lyrics:_ The metadata can contain “synchronized lyrics”, e.g. for karaoke. This event will be triggered at the times specified in the lyrics.

The events are "function calls" and may not directly call methods of the audio object.

bool eof = false;
void my_audio_info(Audio::msg_t m) {
    switch(m.e){
        case Audio::evt_eof:            Serial.printf("end of file:  %s\n", m.msg);
                                        eof = true;
                                        break;
    }
}

void loop() {
    audio.loop();
    vTaskDelay(1);
    if(eof){
        eof = false
        // do someting e.g. audio.connecttoFS(SD_MMC, "next_file.mp3");
    }
}