Simplify 500 error handling: no immediate retry, let background scheduler recover; raise SERVER_FAILURE_THRESHOLD to 5; serve stale playlist on fetch failure; adapt refresh timing to FFmpeg HLS heuristic; re-queue failed segments for fast retry#74
Conversation
…sion jars Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
|
There is an issue from something that caused a missing channel token. See the log |
The log in your comment was truncated — the comment system stopped parsing at what it interpreted as an XML closing tag in the log content (the message cuts off right after A few things that would help most:
In the meantime I've audited the new cookie-routing code and the path for |
|
@copilot Here is the continued log |
... The log is still being truncated at the same spot — the line after Could you share the log one of these ways instead?
The exact content of that DEBUG line (and any WARN/ERROR lines nearby) is what I need to identify the root cause. |
…e header and returned by server Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
Thanks for the gist. The log made the issue clear. Root cause: Fix (commit Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot Still more errors in cookie refreshes See the log here: https://gist.github.com/paul-1/d46eff6d699bd80d60c6e5a6692ee49d#file-sxm-proxy-log |
…cookies Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
Three separate bugs identified and fixed in commit Bug 1 — Renewal feedback loop ( Bug 2 — SXMAKTOKEN permanently lost: The SiriusXM server never returns a new Fixes applied:
|
|
@copilot when trying to play different channels at the same time. There is something getting crossed in the cookies, forcing the renew authentication back and forth. See the log https://gist.github.com/paul-1/8429540d8ecfa32e7f81c6c802f82730#file-sxm-log |
…3 loops Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
Root cause identified and fixed in commit Root cause —
Fix: Moved
|
|
@copilot still issues with playlist handling. In the below log, it says it got 3 total failures, yet somehow after it reported 3 failures, it had a playlist to process. |
…reshold After failure #3 triggers switch_channel_server(), the immediate retry in get_playlist/get_segment still used the stale primary-server URL (already embedded in $request). If that retry succeeded, record_channel_success() reset the failure counter to 0 while channel_server was 'secondary', bypassing the 3-failure threshold entirely. Fix: capture get_channel_server() before and after record_channel_failure(). If the server changed (i.e. the switch fired), return undef immediately so the background refresh reschedules a 2-second retry and get_playlist_url() fetches a fresh URL from the new server. Same fix applied to get_segment(). Agent-Logs-Url: https://github.com/paul-1/plugin-SiriusXM/sessions/72ad1a01-cf50-410d-a154-f6f804f59074 Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
... Fixed in The 4th retry bypassed the threshold because Fix: capture |
…R_FAILURE_THRESHOLD to 5 Agent-Logs-Url: https://github.com/paul-1/plugin-SiriusXM/sessions/44e479d4-5648-47df-8eea-326f783f5fcf Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
|
@copilot See the attached log. When the client requests a playlist, and it is expired. If there is an error, we cannot return that error to the client, just return the stale playlist. We need to look at playlist refresh timing, as we should never really have an expired playlist when the client asks. We need to change our refresh time to match what ffmpeg is going to do. When we get a new playlist if there is a new segment, then the refresh time is equal to the EXTINF, if there is no new segment in the playlist, then we need to refresh the playlist in EXTINF / 2.0. In short, we need to adapt to FFMpeg's refresh logic. |
…ent error) Agent-Logs-Url: https://github.com/paul-1/plugin-SiriusXM/sessions/3bb1a287-ba86-45b3-aa7b-352355aad52f Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
Fixed in |
|
@copilot you missed the second requested change. We need to look at playlist refresh timing, as we should never really have an expired playlist when the client asks. We need to change our refresh time to match what ffmpeg is going to do. When we get a new playlist if there is a new segment, then the refresh time is equal to the EXTINF, if there is no new segment in the playlist, then we need to refresh the playlist in EXTINF / 2.0. In short, we need to adapt to FFMpeg's refresh logic. |
…0.5) with hold-count safety escalation Agent-Logs-Url: https://github.com/paul-1/plugin-SiriusXM/sessions/072946b2-a1d4-4692-8de6-1e45ee87bad8 Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
…o playlist refresh logic Agent-Logs-Url: https://github.com/paul-1/plugin-SiriusXM/sessions/072946b2-a1d4-4692-8de6-1e45ee87bad8 Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
Done in FFmpeg's FFmpeg's reload loop (
What's now implemented:
|
|
@copilot we need to look at the segment retry logic. Segment ending 172_v3 failed with a code 500, but it did not get retried until 11s later after the playlist was updated again. See log |
…p after MAX_SEGMENT_RETRIES=3 Agent-Logs-Url: https://github.com/paul-1/plugin-SiriusXM/sessions/e83b7376-d42a-4865-8b32-eaa82897ccb8 Co-authored-by: paul-1 <6473457+paul-1@users.noreply.github.com>
... Fixed in When The fix: on failure, the segment path is |
Reworks how transient CDN/server errors (HTTP 500) are handled in
get_playlistandget_segment, improves client resilience when a fresh playlist fetch fails, aligns the background playlist refresh cadence with FFmpeg's HLS polling logic, and ensures failed segment fetches are retried quickly instead of waiting for the next playlist refresh.Changes Made
get_playlistandget_segmentno longer retry the CDN request immediately on a 500. Instead, the failure is recorded andundefis returned so the background scheduler handles the next attempt. Only 403 (genuine auth rejection) triggersauthenticate().SERVER_FAILURE_THRESHOLDfrom 3 to 5: Allows more transient failures before switching to the secondary CDN server, accommodating service interruptions that outlast a small retry window.get_playlistnow captures the stale cached content before attempting a fresh fetch. If the fetch fails for any reason (500, 403 after exhausted re-auth, URL lookup failure, missing token, or any other non-success status), the stale playlist is returned to the client instead of an error. Background refresh calls (use_cache=0) are unaffected and still returnundefso the scheduler retries normally.calculate_playlist_update_delaynow follows FFmpeg's HLS reload heuristic — when a new segment is present the next refresh is scheduled after one EXTINF interval; when no new segment is found the refresh fires after EXTINF / 2. This keeps the proxy in sync with FFmpeg's own polling cadence, minimising the chance of serving an expired playlist to the client.playlist_hold_countcounter tracks consecutive fetches that return no new segments. EveryMAX_HOLD_COUNT(3) such fetches — approximately 1.5 × EXTINF of stalled content —record_channel_failure()is called and the counter resets. AfterSERVER_FAILURE_THRESHOLDescalation events the proxy switches to the secondary CDN server, mirroring FFmpeg'sm3u8_hold_counterssafety limit.cache_next_segmentfails to fetch a segment (e.g. transient 500), the segment is placed back at the front of the queue soprocess_segment_queues()retries it on the next scheduler tick (~1 second) rather than waiting up to 11 seconds for the next playlist refresh. A per-channelsegment_retry_counttracks consecutive failures per segment; afterMAX_SEGMENT_RETRIES(3) failed attempts the segment is dropped and the retry counter is cleaned up. The retry counter is also cleared when a channel goes idle viaclear_channel_cache().Original prompt
This pull request was created from Copilot chat.
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.