Skip to content

Commit 3d96e33

Browse files
AndrewAltimitAI Agent BotclaudeAI Review Agent
authored
perf: TV Guide streaming — ffmpeg backend, decode pipeline optimizations (#85)
* perf: TV Guide streaming — ffmpeg backend, decode pipeline optimizations Switch default desktop video backend from openh264 (Baseline-only) to system ffmpeg (Main/High profile, SIMD-optimized). Profiling showed openh264 caused repeated reinit cycles (~5s gaps) on archive.org Main-profile content, yielding only 2 fps. With ffmpeg: 27-32 fps decode, 15-30 fps display, zero audio drops. Streaming buffer optimizations: - Lock-free reads for cached header/moov data (skip mutex on immutable regions) - Buffer eviction uses copy_within+truncate instead of drain - RETAIN_BEHIND increased to 8MB for tolerant-decode seek-back coverage Video decode pipeline: - Two-phase H.264 decode strategy for openh264 path: reinit only for initial sync, then tolerate errors during steady-state playback - Move frame scaling from decode thread to main thread (unblock decode) - Audio channel capacity 64→256 (eliminates 41-56% drop rate) - AAC SampleBuffer reused across decode calls (avoid per-frame alloc) YUV-to-RGBA conversion: - Process pixel pairs sharing chroma samples (halves UV lookups) - Unchecked indexing with row-level bounds verification - Odd-width frame support Measured improvement (CH5 / CH13): - Decode: 2.3→27.7 fps / 2.1→32.5 fps - Display: 2.0→15-19 fps / 1.6→30.1 fps - Audio drops: 41%→0% / 56%→0% - H.264 reinits: 4 per 16s → 0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address AI review feedback (iteration 1) Automated fix by Claude in response to Gemini/Codex review. Iteration: 1/5 Co-Authored-By: AI Review Agent <noreply@anthropic.com> * fix: revert default feature to video-decode (openh264) for CI compat CI Docker containers don't have ffmpeg dev libraries installed. Keep openh264 as the default (works in CI), document ffmpeg as the recommended desktop build path in Cargo.toml feature comments. To build with ffmpeg locally: cargo build -p oasis-app --no-default-features \ --features video-decode-ffmpeg,javascript Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: YUV bounds checks use assert! not debug_assert! (Gemini review) The bounds checks guarding unsafe get_unchecked calls in yuv420_to_rgba must use assert! (not debug_assert!) to maintain soundness in release builds. debug_assert! is stripped in release, which would allow invalid strides from callers to cause UB in this public safe function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: AI Agent Bot <ai-agent@localhost> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: AI Review Agent <ai-review-agent@localhost>
1 parent 10443de commit 3d96e33

File tree

8 files changed

+318
-160
lines changed

8 files changed

+318
-160
lines changed

crates/oasis-app/Cargo.toml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,12 @@ default = ["javascript", "video-decode"]
3030
javascript = ["oasis-core/javascript"]
3131
# Internal: oasis-video is available. Activated by video-decode or video-decode-ffmpeg.
3232
_video = ["dep:oasis-video"]
33-
# Software decode via openh264 (Baseline profile). Default for CI/dev.
33+
# Software decode via openh264 (Baseline profile). Default for CI where
34+
# ffmpeg dev libraries are not installed.
3435
video-decode = ["_video", "oasis-video/h264"]
35-
# Full decode via statically-linked ffmpeg (Main/High profile, SIMD).
36-
# For shipping desktop apps and UE5 games. Mutually exclusive with video-decode.
36+
# Full decode via system ffmpeg (Main/High profile, SIMD-optimized).
37+
# Recommended for desktop use — requires libavcodec-dev etc. on the system.
38+
# Build: cargo build -p oasis-app --no-default-features --features video-decode-ffmpeg,javascript
3739
video-decode-ffmpeg = ["_video", "oasis-video/ffmpeg"]
3840

3941
[dependencies]

crates/oasis-app/src/tv_controller/streaming_buffer.rs

Lines changed: 56 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,11 @@
22
//! fed by a background download thread.
33
44
/// How much data to retain behind the decoder's read cursor.
5-
/// Allows minor backward seeks without re-downloading.
5+
/// Allows minor backward seeks without re-downloading. Sized to cover
6+
/// symphonia's internal seek-back distance when the H.264 decoder skips
7+
/// many packets in tolerant-decode mode (openh264 Baseline-only gaps).
68
#[cfg(feature = "_video")]
7-
pub(crate) const RETAIN_BEHIND: usize = 4 * 1024 * 1024; // 4 MB
9+
pub(crate) const RETAIN_BEHIND: usize = 8 * 1024 * 1024; // 8 MB
810

911
/// Maximum bytes the download thread may be ahead of the decoder's read
1012
/// cursor before it pauses. Keeps memory bounded to ~16 MB lookahead.
@@ -511,6 +513,12 @@ pub(crate) struct StreamingBuffer {
511513
pub(crate) eviction_enabled: std::sync::Arc<std::sync::atomic::AtomicBool>,
512514
/// Whether we've logged a "waiting for data" message (avoids log spam).
513515
logged_wait: bool,
516+
/// Cached header (ftyp) data — once set it never changes, so reads from
517+
/// the header region can be served without acquiring the state mutex.
518+
cached_header: Option<Vec<u8>>,
519+
/// Cached moov atom — once set it never changes. Reads from the moov
520+
/// region can be served without acquiring the state mutex.
521+
cached_moov: Option<(u64, std::sync::Arc<Vec<u8>>)>,
514522
}
515523

516524
#[cfg(feature = "_video")]
@@ -521,6 +529,8 @@ impl StreamingBuffer {
521529
pos: 0,
522530
eviction_enabled: std::sync::Arc::new(std::sync::atomic::AtomicBool::new(false)),
523531
logged_wait: false,
532+
cached_header: None,
533+
cached_moov: None,
524534
}
525535
}
526536

@@ -543,7 +553,11 @@ impl StreamingBuffer {
543553
let evict = cursor_in_buf - RETAIN_BEHIND;
544554
let evict = evict.min(s.buf.len());
545555
if evict > 0 {
546-
s.buf.drain(..evict);
556+
// In-place shift: copy_within avoids the Drain iterator
557+
// overhead and keeps the Vec's allocation stable.
558+
let new_len = s.buf.len() - evict;
559+
s.buf.copy_within(evict.., 0);
560+
s.buf.truncate(new_len);
547561
s.base_offset += evict as u64;
548562
log::debug!(
549563
"TV: evicted {:.1}MB, window now {:.1}MB ({}-{})",
@@ -573,10 +587,46 @@ impl std::io::Read for StreamingBuffer {
573587
return Err(std::io::Error::other(e.clone()));
574588
}
575589

590+
// Try serving from cached header/moov without acquiring the lock.
591+
// These are set once and never modified, so lock-free access is safe.
592+
if let Some(ref hdr) = self.cached_header
593+
&& (self.pos as usize) < hdr.len()
594+
{
595+
let local = self.pos as usize;
596+
let n = buf.len().min(hdr.len() - local);
597+
buf[..n].copy_from_slice(&hdr[local..local + n]);
598+
self.pos += n as u64;
599+
break n;
600+
}
601+
if let Some((moov_off, ref moov_data)) = self.cached_moov {
602+
let moov_end = moov_off + moov_data.len() as u64;
603+
if self.pos >= moov_off && self.pos < moov_end {
604+
let local = (self.pos - moov_off) as usize;
605+
let n = buf.len().min(moov_data.len() - local);
606+
buf[..n].copy_from_slice(&moov_data[local..local + n]);
607+
self.pos += n as u64;
608+
break n;
609+
}
610+
}
611+
576612
let s = self.inner.state.lock().unwrap_or_else(|e| e.into_inner());
577613

578-
// Try serving from the retained file header (ftyp).
579-
if let Some(ref hdr) = s.header
614+
// Populate caches from state if not yet set (one-time copy).
615+
if self.cached_header.is_none()
616+
&& let Some(ref hdr) = s.header
617+
{
618+
self.cached_header = Some(hdr.clone());
619+
}
620+
if self.cached_moov.is_none()
621+
&& let Some((off, ref arc)) = s.moov
622+
{
623+
self.cached_moov = Some((off, std::sync::Arc::clone(arc)));
624+
}
625+
626+
// Retry header/moov from newly-populated caches before hitting
627+
// the sliding buffer. This handles the first read after moov
628+
// becomes available.
629+
if let Some(ref hdr) = self.cached_header
580630
&& (self.pos as usize) < hdr.len()
581631
{
582632
let local = self.pos as usize;
@@ -585,9 +635,7 @@ impl std::io::Read for StreamingBuffer {
585635
self.pos += n as u64;
586636
break n;
587637
}
588-
589-
// Try serving from the retained moov atom.
590-
if let Some((moov_off, ref moov_data)) = s.moov {
638+
if let Some((moov_off, ref moov_data)) = self.cached_moov {
591639
let moov_end = moov_off + moov_data.len() as u64;
592640
if self.pos >= moov_off && self.pos < moov_end {
593641
let local = (self.pos - moov_off) as usize;

crates/oasis-app/src/video_player.rs

Lines changed: 49 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -377,7 +377,7 @@ impl VideoPlayer {
377377
self.frame_height = height;
378378

379379
let (video_tx, video_rx) = mpsc::sync_channel::<VideoFrame>(4);
380-
let (audio_tx, audio_rx) = mpsc::sync_channel::<SoftwareAudio>(64);
380+
let (audio_tx, audio_rx) = mpsc::sync_channel::<SoftwareAudio>(256);
381381
let (stop_tx, stop_rx) = mpsc::channel::<()>();
382382

383383
let target_w = width;
@@ -398,12 +398,20 @@ impl VideoPlayer {
398398
},
399399
);
400400
let t0 = std::time::Instant::now();
401+
// With the symphonia backend, extract avcC from moov to skip
402+
// the full-file scan. With ffmpeg, it handles avcC internally.
403+
#[cfg(not(feature = "video-decode-ffmpeg"))]
401404
let open_result = if let Some(ref moov) = moov_data {
402405
let avcc = oasis_video::find_avcc_in_mp4(moov);
403406
oasis_video::SoftwareVideoDecoder::open_stream_with_avcc(source, avcc)
404407
} else {
405408
oasis_video::SoftwareVideoDecoder::open_stream(source)
406409
};
410+
#[cfg(feature = "video-decode-ffmpeg")]
411+
let open_result = {
412+
let _ = &moov_data; // suppress unused warning
413+
oasis_video::SoftwareVideoDecoder::open_stream(source)
414+
};
407415
let mut decoder = match open_result {
408416
Ok(d) => {
409417
log::info!(
@@ -466,7 +474,7 @@ impl VideoPlayer {
466474
height: u32,
467475
) {
468476
let (video_tx, video_rx) = mpsc::sync_channel::<VideoFrame>(4);
469-
let (audio_tx, audio_rx) = mpsc::sync_channel::<SoftwareAudio>(64);
477+
let (audio_tx, audio_rx) = mpsc::sync_channel::<SoftwareAudio>(256);
470478
let (stop_tx, stop_rx) = mpsc::channel::<()>();
471479

472480
std::thread::spawn(move || {
@@ -522,8 +530,8 @@ impl VideoPlayer {
522530
video_tx: mpsc::SyncSender<VideoFrame>,
523531
audio_tx: mpsc::SyncSender<SoftwareAudio>,
524532
stop_rx: mpsc::Receiver<()>,
525-
target_w: u32,
526-
target_h: u32,
533+
_target_w: u32,
534+
_target_h: u32,
527535
) {
528536
log::info!("VideoPlayer: software decode thread started");
529537

@@ -567,26 +575,13 @@ impl VideoPlayer {
567575
);
568576
}
569577
let ts = frame.timestamp_secs;
570-
let (data, w, h) = if frame.width == target_w && frame.height == target_h {
571-
(frame.rgba, frame.width, frame.height)
572-
} else {
573-
(
574-
simple_scale(
575-
&frame.rgba,
576-
frame.width,
577-
frame.height,
578-
target_w,
579-
target_h,
580-
),
581-
target_w,
582-
target_h,
583-
)
584-
};
578+
// Send native-resolution frames — scaling is done on
579+
// the main thread to keep the decode thread unblocked.
585580
if video_tx
586581
.send(VideoFrame {
587-
data,
588-
width: w,
589-
height: h,
582+
data: frame.rgba,
583+
width: frame.width,
584+
height: frame.height,
590585
timestamp_secs: ts,
591586
})
592587
.is_err()
@@ -805,7 +800,18 @@ impl VideoPlayer {
805800
#[cfg(feature = "_video")]
806801
let frame_ts = frame.timestamp_secs;
807802

808-
match backend.load_texture(fw, fh, &frame.data) {
803+
// Scale on the main thread (moved from decode thread to avoid
804+
// blocking decode with CPU-intensive scaling).
805+
let needs_scale = fw != self.frame_width || fh != self.frame_height;
806+
let scaled;
807+
let (tex_data, upload_w, upload_h) = if needs_scale {
808+
scaled = simple_scale(&frame.data, fw, fh, self.frame_width, self.frame_height);
809+
(scaled.as_slice(), self.frame_width, self.frame_height)
810+
} else {
811+
(frame.data.as_slice(), fw, fh)
812+
};
813+
814+
match backend.load_texture(upload_w, upload_h, tex_data) {
809815
Ok(tex) => {
810816
self.current_texture = Some(tex);
811817
self.last_frame_time = Some(Instant::now());
@@ -950,14 +956,26 @@ impl VideoPlayer {
950956
/// Nearest-neighbor RGBA scale.
951957
#[cfg(feature = "_video")]
952958
fn simple_scale(src: &[u8], src_w: u32, src_h: u32, dst_w: u32, dst_h: u32) -> Vec<u8> {
953-
let mut dst = vec![0u8; (dst_w * dst_h * 4) as usize];
954-
for y in 0..dst_h {
955-
let sy = (y * src_h / dst_h).min(src_h - 1);
956-
for x in 0..dst_w {
957-
let sx = (x * src_w / dst_w).min(src_w - 1);
958-
let si = (sy * src_w + sx) as usize * 4;
959-
let di = (y * dst_w + x) as usize * 4;
960-
dst[di..di + 4].copy_from_slice(&src[si..si + 4]);
959+
let sw = src_w as usize;
960+
let sh = src_h as usize;
961+
let dw = dst_w as usize;
962+
let dh = dst_h as usize;
963+
let mut dst = vec![0u8; dw * dh * 4];
964+
965+
for y in 0..dh {
966+
let sy = (y * sh / dh).min(sh - 1);
967+
let src_row = sy * sw * 4;
968+
let dst_row = y * dw * 4;
969+
970+
for x in 0..dw {
971+
let sx = (x * sw / dw).min(sw - 1);
972+
let si = src_row + sx * 4;
973+
let di = dst_row + x * 4;
974+
// SAFETY: si + 4 <= src.len() because sy < src_h and sx < src_w,
975+
// di + 4 <= dst.len() because y < dst_h and x < dst_w.
976+
unsafe {
977+
std::ptr::copy_nonoverlapping(src.as_ptr().add(si), dst.as_mut_ptr().add(di), 4);
978+
}
961979
}
962980
}
963981
dst

crates/oasis-video/Cargo.toml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,11 +32,10 @@ symphonia = { version = "0.5", default-features = false, features = [
3232
# H.264 decoding (Cisco OpenH264 C source compiled via cc)
3333
openh264 = { version = "0.9", optional = true }
3434

35-
# FFmpeg via Rust bindings — static build from source, no runtime deps.
35+
# FFmpeg via Rust bindings — links against system ffmpeg (pkg-config).
36+
# For static builds (CI/shipping), add "build" + "static" features.
3637
# Only H.264 + AAC codecs and MP4 demuxer are needed.
3738
ffmpeg-next = { version = "7", optional = true, default-features = false, features = [
38-
"build",
39-
"static",
4039
"codec",
4140
"format",
4241
"software-resampling",

crates/oasis-video/src/aac.rs

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
//! AAC audio decoder wrapping symphonia's AAC codec.
22
3-
use symphonia::core::audio::SampleBuffer;
3+
use symphonia::core::audio::{SampleBuffer, SignalSpec};
44
use symphonia::core::codecs::{CODEC_TYPE_AAC, CodecParameters, Decoder, DecoderOptions};
55
use symphonia::core::formats::Packet;
66

@@ -18,6 +18,12 @@ pub struct AacDecoder {
1818
decoder: Box<dyn Decoder>,
1919
sample_rate: u32,
2020
channels: u16,
21+
/// Reusable sample buffer — avoids allocating a new `SampleBuffer` on
22+
/// every `decode()` call. Recreated only when the audio spec changes
23+
/// (rare for AAC-LC; can happen with HE-AAC SBR).
24+
sample_buf: Option<SampleBuffer<f32>>,
25+
/// Tracks the last audio spec to detect mid-stream spec changes (e.g. HE-AAC SBR).
26+
last_spec: Option<SignalSpec>,
2127
}
2228

2329
impl AacDecoder {
@@ -42,6 +48,8 @@ impl AacDecoder {
4248
decoder,
4349
sample_rate,
4450
channels,
51+
sample_buf: None,
52+
last_spec: None,
4553
})
4654
}
4755

@@ -62,7 +70,17 @@ impl AacDecoder {
6270
return Ok(None);
6371
}
6472

65-
let mut sample_buf = SampleBuffer::<f32>::new(duration as u64, spec);
73+
// Reuse the sample buffer if its capacity and spec still match.
74+
// Recreate when the audio spec changes (rare for AAC-LC) or capacity is insufficient.
75+
let need_new = self
76+
.sample_buf
77+
.as_ref()
78+
.is_none_or(|sb| sb.capacity() < duration || self.last_spec.as_ref() != Some(&spec));
79+
if need_new {
80+
self.sample_buf = Some(SampleBuffer::<f32>::new(duration as u64, spec));
81+
self.last_spec = Some(spec);
82+
}
83+
let sample_buf = self.sample_buf.as_mut().expect("just created above");
6684
sample_buf.copy_interleaved_ref(decoded);
6785

6886
Ok(Some(DecodedAudio {

crates/oasis-video/src/demux.rs

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,21 @@ pub fn find_avcc_in_mp4(mp4_data: &[u8]) -> Option<AvccConfig> {
153153
/// with start codes.
154154
fn avcc_to_annex_b(data: &[u8], nal_length_size: usize) -> Result<Vec<u8>, VideoError> {
155155
let mut out = Vec::with_capacity(data.len() + 32);
156+
avcc_to_annex_b_into(data, nal_length_size, &mut out)?;
157+
Ok(out)
158+
}
159+
160+
/// Convert AVCC to Annex B, reusing the provided output buffer.
161+
///
162+
/// Clears `out` before writing. The caller retains the buffer across calls
163+
/// so its capacity grows to the largest packet and stays stable, eliminating
164+
/// per-packet allocation.
165+
fn avcc_to_annex_b_into(
166+
data: &[u8],
167+
nal_length_size: usize,
168+
out: &mut Vec<u8>,
169+
) -> Result<(), VideoError> {
170+
out.clear();
156171
let mut offset = 0;
157172

158173
while offset + nal_length_size <= data.len() {
@@ -186,7 +201,7 @@ fn avcc_to_annex_b(data: &[u8], nal_length_size: usize) -> Result<Vec<u8>, Video
186201
offset += nal_len;
187202
}
188203

189-
Ok(out)
204+
Ok(())
190205
}
191206

192207
/// MP4 demuxer backed by symphonia.

0 commit comments

Comments
 (0)