Skip to content

feature/pcmu-to-aac#265

Open
cedricve wants to merge 4 commits intomasterfrom
feature/pcmu-to-aac
Open

feature/pcmu-to-aac#265
cedricve wants to merge 4 commits intomasterfrom
feature/pcmu-to-aac

Conversation

@cedricve
Copy link
Copy Markdown
Member

@cedricve cedricve commented Mar 19, 2026

Live Environment

Access the Pull Request environment here

Description

Enable PCMU and LPCM Audio Support via Transcoding to AAC

Motivation

IP cameras often stream audio in PCMU (G.711 mu-law) or LPCM formats over RTSP, but our recording pipeline previously skipped these, limiting recordings to video-only or direct AAC streams. This change introduces real-time transcoding of PCMU and LPCM to AAC using FFmpeg, ensuring audio is captured and muxed into MP4 files regardless of the source format. It also adds explicit LPCM detection in RTSP sessions and allows configuration overrides for sample rate/channels in LPCM streams to handle mismatches.

Key Changes

  • RTSP Handling (gortsplib.go): Added FindLPCM to discover LPCM media/formats, setup decoding, and integrate into packet queues with proper metadata (sample rate, channels, bit depth). Extended PCMU streams with default metadata.
  • Recording Pipeline (main.go): Introduced recordingAudioWriter to abstract audio track creation and sample writing. For non-AAC inputs, it transcodes to AAC before muxing. Refactored continuous and motion-detection recording to use packet timestamps for precise rollovers, avoiding frame drops during max-duration limits or post-recording expiration. Added LPCM-specific config validation with warnings.
  • Transcoder Implementation (pcmu_to_aac_transcoder.go): New FFmpeg-based transcoder for PCMU/LPCM → AAC (ADTS output). Supports input validation, non-blocking writes, timed flushes, and error handling. Checks FFmpeg availability at runtime; falls back to skipping audio if unavailable.

Improvements

  • Broader Compatibility: Now records audio from PCMU/LPCM sources (common in ONVIF/RTSP cameras), producing consistent AAC tracks in MP4s for better playback and storage efficiency.
  • Reliability: Transcoding is asynchronous to prevent blocking the recording loop; improved rollover logic aligns audio/video starts on keyframes and drains buffers cleanly on close.
  • Flexibility: Configurable audio params for LPCM; supports 8/16/24-bit depths. Logs detailed diagnostics (e.g., AAC frame params, mismatches) without impacting performance.
  • No Breaking Changes: Existing AAC handling remains unchanged; transcoding is opt-in based on detected codec.

This enhances recording completeness and robustness, reducing lost audio in diverse camera setups while maintaining low overhead. Tested with sample RTSP streams; requires FFmpeg for full functionality.

Copilot AI review requested due to automatic review settings March 19, 2026 14:59
// called when a MULAW audio RTP packet arrives
if g.AudioLPCMMedia != nil && g.AudioLPCMForma != nil {
g.Client.OnPacketRTP(g.AudioLPCMMedia, g.AudioLPCMForma, func(rtppkt *rtp.Packet) {
pts, ok := g.Client.PacketPTS(g.AudioLPCMMedia, rtppkt)
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for recording non-AAC RTSP audio (PCMU and LPCM) by transcoding to AAC/ADTS, and improves MP4 audio timing/metadata so generated MP4s carry correct AAC sample-rate/channel info.

Changes:

  • Introduce an ffmpeg-backed PCMU/LPCM → AAC (ADTS) transcoder and a recording audio writer that feeds MP4 muxing.
  • Update capture pipeline to use the new audio writer and to roll over recordings on keyframes without dropping frames.
  • Update MP4 muxer to derive AAC frame durations and audio timescale from ADTS headers, and set an AAC-LC descriptor with detected sample-rate/channels.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
machinery/src/video/mp4.go Switch audio duration math to AAC sample counts, derive audio timescale from ADTS, and set an AAC-LC descriptor with parsed parameters.
machinery/src/packets/stream.go Add BitDepth to audio stream metadata for PCM/LPCM handling.
machinery/src/capture/pcmu_to_aac_transcoder.go New ffmpeg-based transcoder + ADTS framing utilities + recordingAudioWriter integration with MP4.
machinery/src/capture/main.go Route audio recording through recordingAudioWriter, pass bit depth for LPCM, and improve rollover logic.
machinery/src/capture/gortsplib.go Add LPCM stream discovery/decoding and include default PCMU stream parameters (rate/channels) in stream metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +565 to +570
pts, ok := g.Client.PacketPTS(g.AudioLPCMMedia, rtppkt)
pts2, ok := g.Client.PacketPTS2(g.AudioLPCMMedia, rtppkt)
if !ok {
log.Log.Debug("capture.golibrtsp.Start(): " + "unable to get PTS")
return
}
func (g *Golibrtsp) Start(ctx context.Context, streamType string, queue *packets.Queue, configuration *models.Configuration, communication *models.Communication) (err error) {
log.Log.Debug("capture.golibrtsp.Start(): started")

// called when a MULAW audio RTP packet arrives
Comment on lines +645 to 648
// Set the audio descriptor to match the AAC-LC stream actually produced by ffmpeg/camera.
err := setAACLCDescriptor(init.Moov.Traks[1], audioSampleRate, audioChannels)
if err != nil {
}
Comment on lines +61 to +67
"-f", inputFormat,
"-ar", intToString(sampleRate),
"-ac", intToString(channels),
"-i", "pipe:0",
"-vn",
"-ac", "1",
"-ar", intToString(sampleRate),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants