Skip to content

Build a Typescript/Node worker for batch processing HLS stream files into PCM / WAV #1

@adrmac

Description

@adrmac

Prerequisite: read #6 description
Pairs with #2

Project summary

Build a small TypeScript/Node worker that concatenates HLS segments ( .m3u8 / .ts ) and decodes to PCM. Then write downloadable WAV clips (and sidecar metadata) to a shared folder (e.g. Dropbox).

This shares the exact HLS->PCM core process as #2 with a different output.

Immediate use case

Generate text-WAV pairs from Orcasound stream buckets for CLAP model training

Basic steps

  • Resolve [start,end] to a segment list from .m3u8
  • Spawn ffmpeg (child process) to decode HLS to PCM (mono 48 kHz; single pass).
  • Wrap PCM as WAV and stream to disk; write a .meta.json sidecar with metadata.
  • Write a summary JSON pairing detection comment text with file URLs.

Inputs

  • Single job: { feedSlug, start, end }
  • Batch: CSV manifest with columns { feedSlug, start, end }

Outputs

clips/<feedSlug>/<YYYY>/<MM>/<DD>/<feedSlug>_<YYYYMMDDTHHMMSS>_<dur>s.wav
clips/.../<same>.meta.json
clips/.../pairs.json

Sidecar fields: feedId, feedSlug, start/end ISO, duration_s, samplerate_hz, channels, m3u8 URL, segments[], ffmpeg args, sha256 hash.

CLI (example)

npm run clip -- \
  --feedSlug orcasound_lab \
  --start 2025-09-12T10:00:00Z \
  --end   2025-09-12T10:05:00Z \
  --out   s3   # or local

Dependencies

  • ffmpeg (system binary in Docker or ffmpeg-static in dev).
  • Node 20, TypeScript.

Error handling & ops

  • Clear errors for missing segments / ffmpeg exit codes.
  • Logs include station/time window and segment URIs.
  • Idempotency: output path uses a hash of {feedSlug,startIso,endIso,params} (reruns don’t duplicate).

Acceptance criteria

  • Export a 5-min window to WAV + .meta.json deterministically (same path/hash on rerun).
  • Batch export from a CSV succeeds; failures are reported and don’t block other rows.
  • Runs locally and in the provided Docker image (no global ffmpeg required).

Later scope

  • HTTP endpoints (local)
  • Azure container or other cloud service

Background Context

What is PCM?
PCM = Pulse Code Modulation: the raw sequence of audio samples (e.g., mono, 48 kHz, 16-bit). It’s just numbers in time order—no filename, no timestamps, no metadata. Common raw encodings: s16le (16-bit signed little-endian), f32le (32-bit float).

What gets added to make a WAV?
WAV is a container (RIFF). It wraps those same PCM samples with a small header that says:

  • format (PCM), channels, sample rate, bits per sample
  • byte rate / block align (derived)
  • the data size (how many bytes of audio)

So: WAV = header (≈44 bytes) + PCM data (usually).

How does converting to PCM first save backend processing costs?

  • WAV and PCM have virtually the same file size and are both costly to persist for the full live stream, but can be processed cheaply for on demand requests that save to a local disk.
  • For noise analytics we can do PCM conversion on the live stream, extract only PSD metrics from the live stream that persist in the database, and discard the PCM data.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

triage

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions