-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Prerequisite: read #6 description
Pairs with #2
Project summary
Build a small TypeScript/Node worker that concatenates HLS segments ( .m3u8 / .ts ) and decodes to PCM. Then write downloadable WAV clips (and sidecar metadata) to a shared folder (e.g. Dropbox).
This shares the exact HLS->PCM core process as #2 with a different output.
Immediate use case
Generate text-WAV pairs from Orcasound stream buckets for CLAP model training
Basic steps
- Resolve [start,end] to a segment list from .m3u8
- Spawn ffmpeg (child process) to decode HLS to PCM (mono 48 kHz; single pass).
- Wrap PCM as WAV and stream to disk; write a .meta.json sidecar with metadata.
- Write a summary JSON pairing detection comment text with file URLs.
Inputs
- Single job: { feedSlug, start, end }
- Batch: CSV manifest with columns { feedSlug, start, end }
Outputs
clips/<feedSlug>/<YYYY>/<MM>/<DD>/<feedSlug>_<YYYYMMDDTHHMMSS>_<dur>s.wav
clips/.../<same>.meta.json
clips/.../pairs.json
Sidecar fields: feedId, feedSlug, start/end ISO, duration_s, samplerate_hz, channels, m3u8 URL, segments[], ffmpeg args, sha256 hash.
CLI (example)
npm run clip -- \
--feedSlug orcasound_lab \
--start 2025-09-12T10:00:00Z \
--end 2025-09-12T10:05:00Z \
--out s3 # or local
Dependencies
- ffmpeg (system binary in Docker or ffmpeg-static in dev).
- Node 20, TypeScript.
Error handling & ops
- Clear errors for missing segments / ffmpeg exit codes.
- Logs include station/time window and segment URIs.
- Idempotency: output path uses a hash of {feedSlug,startIso,endIso,params} (reruns don’t duplicate).
Acceptance criteria
- Export a 5-min window to WAV + .meta.json deterministically (same path/hash on rerun).
- Batch export from a CSV succeeds; failures are reported and don’t block other rows.
- Runs locally and in the provided Docker image (no global ffmpeg required).
Later scope
- HTTP endpoints (local)
- Azure container or other cloud service
Background Context
What is PCM?
PCM = Pulse Code Modulation: the raw sequence of audio samples (e.g., mono, 48 kHz, 16-bit). It’s just numbers in time order—no filename, no timestamps, no metadata. Common raw encodings: s16le (16-bit signed little-endian), f32le (32-bit float).
What gets added to make a WAV?
WAV is a container (RIFF). It wraps those same PCM samples with a small header that says:
- format (PCM), channels, sample rate, bits per sample
- byte rate / block align (derived)
- the data size (how many bytes of audio)
So: WAV = header (≈44 bytes) + PCM data (usually).
How does converting to PCM first save backend processing costs?
- WAV and PCM have virtually the same file size and are both costly to persist for the full live stream, but can be processed cheaply for on demand requests that save to a local disk.
- For noise analytics we can do PCM conversion on the live stream, extract only PSD metrics from the live stream that persist in the database, and discard the PCM data.
Metadata
Metadata
Labels
Type
Projects
Status
Status