Switch to fmp4 intead of mgepg by zoriya · Pull Request #542 · zoriya/Kyoo

zoriya · 2024-06-17T21:57:44Z

Closes #534

Refs:

Currently, this PR is not working, I guess timestamps or mp4 headers are invalid /shrug

zoriya · 2024-06-17T21:58:37Z

 func (ts *Stream) run(start int32) error {
 	// Start the transcode up to the 100th segment (or less)
 	length, is_done := ts.file.Keyframes.Length()
-	end := min(start+100, length)


Remember to revert this after testing

zoriya · 2024-06-19T10:21:53Z

Maybe it would be wisest to switch to -f hls instead of -f segment to use their built-in fmp4 handling. Would need to check if it's possible to use segments from multiples transcode process this way tho.

zoriya · 2024-06-30T17:25:53Z

I'm realising supporting this would allow av1 streaming.

This would be huge to reduce bandwidth usage.

Definitely want to work on this on the following days

It seems presentation time and durations are not consistent with previous segments so the hls renderer can't keep up

For now, disabled all audios variants since it's handling will be entierly different. Found out that audio and video segments don't need to lineup. (same number/duration). As long as the whole file stays long enough it's fine. Video handling now fails when there are too many keyfranmes close enough (like 0.01, 0.3, 0.4, 2, 4). It would only output 3 segments instead of the 5 we would want. We might get arround using fragments containing more than 1 keyframe if we handle things right

zoriya · 2024-07-03T17:35:18Z

Idea seem to work, need to:

fix presentation times when restarting encode (see https://github.com/Dusk-Labs/nightfall/blob/master/src/profiles/video.rs for reference)
fix keyframes count/duration for audio (used in segment computation)

zoriya · 2024-07-03T17:51:06Z

We would need to fix the assumption that keyframes are a file level things. This should move to a stream level thing.

It's legal to have audio/video with different segments. It's also legal to have two videos w/ different segments. The only constraint is that variant should have the same segments (aka video1/720p & video1/1080p needs same segments while video1/720p & video2/1080p can have different segments).

Doing this would also unlock video selection when the file contains multiple video streams.

zoriya · 2024-07-03T17:51:38Z

I'm thinking of first fixing this & data store in another PR before continuing this.

JeWe37 · 2024-08-03T15:43:10Z

I'm realising supporting this would allow av1 streaming.

This would be huge to reduce bandwidth usage.

Definitely want to work on this on the following days

Given their similar encoding complexity and performance plus currently still wider support shouldn't HEVC be preferred here?

zoriya · 2024-08-03T23:40:23Z

hevc would be supported too

this would allow avc1 & hevc transmuxing. we could also add a avc1 or hevc transcode mode but i don't know how this could be negotiated. maybe an option on the server and make them available in the manifest for the client to pick

JeWe37 · 2024-08-03T23:58:08Z

hevc would be supported too

this would allow avc1 & hevc transmuxing. we could also add a avc1 or hevc transcode mode but i don't know how this could be negotiated. maybe an option on the server and make them available in the manifest for the client to pick

From what I understand currently Kyoo doesn't offer any API to tell it what codecs the client supports. Jellyfin has a whole system for this, see the DeviceProfile key in this API. Presumably in the future something similar will be required as more devices are supported which have different capabilities. This would probably entail some changes to the transcoding API too though, a session concept would likely make sense in that case too.

zoriya · 2024-08-04T00:36:10Z

The server is not the one responsible for knowing if a media file can be played on the client. The client itself does the check and requests a format it can support. This is done either using the mimeCodec field of the /info endpoint for direct streaming (pristine quality) or using the hls spec with multi-variants.

As long as all formats can be served under a single manifest there won't be a need for a custom protocol to negotiate formats: the hls spec already handles this.

I heard that some devices play badely with fmp4 (but works well with mpegts) so we will probably need to at least support those two versions in 2 different manifests just for those devices.

PS: There is already a session concept, it's used to know if a user is going to need a segment or to kill old ffmpeg processes.

zoriya · 2026-04-30T00:58:25Z

Nothing too important, just trying to remember the state of this. I tried to use opencode for that, not sure it was really useful but whatever here it is:

llm summary

fMP4 support for lazy HLS transcoder

Session ID: ses_224422841ffe2gPU3SJwXdL7y7
Created: 4/30/2026, 2:15:45 AM
Updated: 4/30/2026, 2:47:46 AM

User

The transcoder in @transcoder/src/stream.go is able to encode an arbitrary file to hls lazily. This means any hls player can play the file in any quality as if everything was pre-encoded on the server but when requested a ffmpeg process is spawned to serve just the chunk requested (and prepare the next ones).

You can find a small writeup about it here: https://zoriya.dev/blogs/transcoder/

We are currently using .ts files for segments and we would like to switch to fmp4 to support hevc or av1.

Can you explore the options for this and how feasable that looks? Can we continue to use the -f segment or would -f hls be better suited? Is there room for simplification in the code if we do switch?

There was a previous attemp to switch to fmp4 that failed, some ideas from those attemp got implemented in the meantime.

Here are the draft PR of the previous attemps:

Take those previous attemps in mind when proposing a solution. Summarize all findings (from previous attemps, internet search and your knowledge)

Assistant

Great question — I dug through current code, both draft PRs, FFmpeg docs, and related notes.

What matters most: your lazy architecture depends on deterministic segment numbering/timing across independent ffmpeg runs. That is the core constraint.

Current implementation is built around explicit split points (-segment_times) in transcoder/src/stream.go:295, manual playlist generation in transcoder/src/stream.go:391, and .ts parsing in transcoder/src/api/streams.go:308.
FFmpeg docs confirm segment is deterministic-oriented (explicit split points), while hls is HLS-native but hls_time is a target, not strict per-cut control.

From docs (key excerpts):

hls_time: “Set the target segment length… Segment will be cut on the next key frame after this time has passed.”
segment: “Specify a list of split points…” (segment_times) and “Every segment starts with a keyframe…”
Also docs call hls “a more specific implementation for HLS segmentation.”

What previous attempts tell us

PR #542 (fMP4 + -f hls) found real issues:

timestamp/presentation discontinuity between restarted encodes,
need for init segment handling (init.mp4 / #EXT-X-MAP),
more complexity around parsing ffmpeg HLS output.

PR #592 confirmed the big blocker:

-hls_time behavior is not strict slicing per requested keyframes (you observed 5.9s/3.9s alternation).
Workaround attempt was keyframe coarsening (e.g. ~5s cadence), which changes your current “every keyframe segment” model.

Can you keep `-f segment` and just switch to fMP4?

Technically possible, but not ideal for true HLS-fMP4 semantics:

segment can write MP4 segments (segment_format + segment_format_options), but docs do not provide native EXT-X-MAP / init-fragment flow like hls.
You’ll likely re-implement HLS-fMP4 semantics yourself (init lifecycle, map tag correctness, fragment flags, continuity), which is fragile.
So: feasible for “MP4 chunk files”, weaker for “proper HLS-fMP4 behavior at scale”.

Is `-f hls` better suited?

For fMP4/HLS correctness: yes.
For your lazy deterministic model: not automatically.

It’s better for:

-hls_segment_type fmp4
-hls_fmp4_init_filename
native playlist semantics

But it conflicts with:

exact per-keyframe deterministic slicing you currently rely on with segment_times.

What got better since those attempts (important)

A few earlier blockers are now improved in main code:

per-stream keyframe handling is now stream-specific (GetKeyframes(... isVideo, idx) in transcoder/src/keyframes.go:105), matching your PR discussion.
audio drift handling improved (output_ts_offset path in transcoder/src/stream.go:275).
packet-aligned audio keyframe extraction exists now (transcoder/src/keyframes.go:277), better than old fixed-grid approaches.

So feasibility today is better than during #542/#592.

Practical recommendation

I’d recommend:

Keep current TS/segment path as stable fallback.
Add a new fMP4 path using -f hls (not segment) behind a flag.
Accept/define a coarser deterministic cadence (e.g. 4–6s segment model), instead of “every source keyframe.”
Add init-segment plumbing (init.mp4 route + #EXT-X-MAP) and strict timestamp continuity checks.
Validate on target players (Safari/tvOS especially; AV1 support still uneven across clients).

So the answer to your core question:

If your goal is proper HLS fMP4 for HEVC/AV1, -f hls is the right muxer.
If your goal is preserving your current deterministic per-request segmentation model unchanged, -f segment is easier — but weaker/awkward for true HLS-fMP4 semantics.

If you want, I can draft a concrete migration plan with exact ffmpeg arg sets and code touchpoints (stream.go, videostream.go, audiostream.go, api/streams.go) for a low-risk phased rollout.

User

Some questions i'd like to clarify:

what's in the init.mp4? would it be the same init.mp4 if we transcode segments [0, 100[ and segments [200, 300[? if not, which should we save?
would pts be valid agross different transcodes? i remember playing with hls_segment_options to try to mitigate this.
could we predict deterministically at which times the -f hls muxer will cut segments? if it's always the next keyframe after hls_time * [the number of segments written so far] we should be able to as long as -ss is reliable but my last attemps talks about a +/- an initial time that i dont know of.
could we have a reliable -ss when doing -c:v copy? i think combining input & output seeking was not working, can you confirm?

Assistant

Great set of questions. Short answers first:

init.mp4 is codec/track metadata, not media samples.
Across independent ffmpeg runs, init.mp4 is not guaranteed identical.
-f hls cut points are mostly predictable, but there is indeed a start-offset term.
-c copy + seeking is only reliable at keyframe granularity; mixed input/output seek is not robust for exact boundaries.

1) What is in init.mp4, and which one to keep?

It contains ftyp + moov (track definitions, codec extradata like avcC/hvcC, timescales, track IDs, defaults), no media payload.
.m4s segments carry moof+mdat (actual fragment metadata + samples).
If you encode [0,100[ and [200,300[ in separate ffmpeg processes, init can be:
- byte-different but compatible, or
- genuinely incompatible (track IDs/extradata differences).
Safest policy: treat each independent run as potentially having its own init and pair it with its own #EXT-X-MAP.
Reusing one global init is safe only if stream layout + codec config are strictly identical across runs.

2) Are PTS valid across different transcodes?

Not automatically. Independent runs often rebase differently.
Your prior hls_segment_options experiments make sense: frag_discont helps signaling fragmentation discontinuity, but it does not magically enforce full timeline continuity.
For reliable cross-run timeline:
- keep one timestamp policy consistently (-copyts, -copytb, -avoid_negative_ts choice fixed),
- and if needed apply explicit offset (-output_ts_offset) to place each run on one absolute timeline.
If continuity is not strict, use #EXT-X-DISCONTINUITY at run boundaries.

3) Can -f hls cut times be predicted deterministically?

From hlsenc.c, split condition is effectively cumulative:
- cut when (pkt_pts - start_pts) >= target.
- target is recording_time * segment_index (with special first-phase behavior if hls_init_time is used).
So yes, there is an implicit initial offset via start_pts (this is likely your “unknown +/- initial time”).
Also, cut is gated by keyframe (unless split_by_time), so actual boundary is “first eligible keyframe after threshold”.
Bottom line: predictable if keyframes and start_pts are stable across runs; otherwise drift appears.

4) Reliable -ss with -c:v copy (and mixed input/output seek)?

Confirming your memory: exact boundary behavior is unreliable.
-c copy seeks at random access points / keyframes; not frame-accurate.
Input seek (-ss before -i) is fast/coarse; output seek (-ss after -i) can still be problematic with copy at boundaries.
Combining both usually increases edge-case weirdness rather than fixing it for copy mode.
So for lazy independent windows, -c:v copy is reliable only if you accept keyframe-snapped starts.

If you want, next I can propose a concrete “safe fMP4 lazy mode” contract (init rotation rules + discontinuity rules + exact ffmpeg flags) that fits your current architecture.

User

Some confirmations for each points:

Reusing one global init is safe only if stream layout + codec config are strictly identical across runs

I'm encoding the same file in a different ffmpeg process, would this change?

what would the start_pts refer to? the pts of the first keyframe after the input seek? the output_ts_offset flag? you might need to make a table with different flag combinations

Confirm this behavior by reading the source code or searching the internet, from my understanding/experimentations:
-ss time -i file -c:v copy seeks to the keyframe after OR before time (with no good way to predict which one)

I don't really remember how it works if we combine with output seek tho

Assistant

Perfect, these are exactly the right things to pin down.

Same file, different ffmpeg process: can init.mp4 change?

Usually stable, if you use same stream mapping/codec settings and the source stream config does not change mid-file.
Can still differ across runs in edge cases:
- stream has in-band parameter changes (new SPS/PPS/VPS, resolution/profile change),
- different selected stream/order/track-id behavior,
- slightly different muxer metadata/box layout (byte-different but possibly still compatible).
Practical rule:
- same file + same variant settings => often reusable,
- but safest is to verify compatibility (codec extradata + track layout), not assume.

What exactly is start_pts in -f hls splitting?
From hlsenc.c, split check is effectively:

pkt_pts - start_pts >= end_pts (plus keyframe gating unless split_by_time)

So start_pts is the muxer’s baseline PTS for that run (first packet timeline anchor for that variant stream), not output_ts_offset itself. output_ts_offset can shift packet PTS and therefore indirectly shift that baseline.

Quick table (same input, independent runs):

Flags	Baseline / cut reference effect
none	baseline depends on demuxer timestamps after seek; can vary run-to-run
`-copyts`	keeps original input ts (less rebasing), but muxer may still shift in some cases
`-copyts -start_at_zero`	preserves relative timeline but rebases origin consistently
`-output_ts_offset X`	shifts output timeline by X, therefore shifts effective split thresholds too
`-avoid_negative_ts ...`	muxer may shift timestamps, can break naive assumptions even with `-copyts`

And yes, your “unknown initial +/-” intuition matches the source: the cumulative threshold is anchored to start_pts, not absolute media time.

-ss time -i file -c:v copy: before or after keyframe?

Docs say: input -ss seeks to closest seek point before requested time.
In practice, observed first emitted packet can look before/after requested time depending on container indexing, timestamp quirks, stream start offsets, and copy-mode packet handling.
So your statement “before OR after with no good prediction” is fair from ops perspective, even if nominal behavior is “before”.
For combined input+output seek with -c copy: not robust for exact boundaries. Output seek discards by timestamp, but copy-mode still lacks frame-accurate trimming guarantees.

So net: your previous conclusions are directionally correct, and they align with both docs and why your earlier attempts hit timeline/discontinuity issues.

zoriya added enhancement Feature part of changelog transcoder labels Jun 17, 2024

zoriya added this to the v4.7.0 milestone Jun 17, 2024

zoriya commented Jun 17, 2024

View reviewed changes

wip: Switch to fmp4 intead of mgepg

2808da8

zoriya force-pushed the feat/fmp4 branch from 08d81c2 to 2808da8 Compare July 1, 2024 07:49

zoriya added 3 commits July 1, 2024 08:08

Better connection loss handling on android

61f800d

Try custom fmp4 movflags

e85d2d4

It seems presentation time and durations are not consistent with previous segments so the hls renderer can't keep up

Add output ts offset

59264bd

zoriya commented Jul 1, 2024

View reviewed changes

Comment thread transcoder/src/stream.go Outdated

zoriya added 3 commits July 1, 2024 15:39

Make a happy path for keyframes.Slices

0579afe

Make fragments of ~6 seconds instead of a fragment per keyframe

f6dab80

zoriya force-pushed the feat/fmp4 branch from 612cf89 to f6dab80 Compare July 1, 2024 16:43

zoriya added 3 commits July 1, 2024 17:20

Create different index.m3u8 for audio and video

9fad5da

Fix extraction directory perms mode

a99317c

Fix file collisions

2ae26d1

zoriya mentioned this pull request Jul 14, 2024

Add "Choose subtitle" button #563

Open

wip: Try to fix disconituity segment

f03464a

This was referenced Aug 7, 2024

Use a database for the transcoder #586

Merged

Switch to -f hls instead of -f segment #592

Draft

zoriya removed this from the v4.7.0 milestone Sep 5, 2024

zoriya mentioned this pull request May 2, 2025

Added distributed transcoding and playback design doc #914

Open

zoriya mentioned this pull request Jun 23, 2025

MPEG Dash / Plex's patched FFmpeg / EAE informations #1005

Open

Conversation

zoriya commented Jun 17, 2024

Uh oh!

zoriya Jun 17, 2024

Choose a reason for hiding this comment

Uh oh!

zoriya commented Jun 19, 2024

Uh oh!

zoriya commented Jun 30, 2024

Uh oh!

Uh oh!

zoriya commented Jul 3, 2024

Uh oh!

zoriya commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoriya commented Jul 3, 2024

Uh oh!

JeWe37 commented Aug 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zoriya commented Aug 3, 2024

Uh oh!

JeWe37 commented Aug 3, 2024

Uh oh!

zoriya commented Aug 4, 2024

Uh oh!

zoriya commented Apr 30, 2026

fMP4 support for lazy HLS transcoder

User

Assistant

What previous attempts tell us

Can you keep -f segment and just switch to fMP4?

Is -f hls better suited?

What got better since those attempts (important)

Practical recommendation

User

Assistant

User

Assistant

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zoriya commented Jul 3, 2024 •

edited

Loading

JeWe37 commented Aug 3, 2024 •

edited

Loading

Can you keep `-f segment` and just switch to fMP4?

Is `-f hls` better suited?