-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
Describe the bug
When the forward backend API returns an HTTP 500 error (or any error), the stream becomes permanently stuck in a "busy" state and cannot be republished until SRS is restarted. This creates a "ghost busy" stream that is not visible via SRS API but prevents any new publish attempts for that stream.
Version
- SRS Version: v6.0-r0 (also tested on v6.0.155)
- Deployment: Kubernetes cluster
- Feature: Dynamic Forward with Backend API
To Reproduce
Steps to reproduce the behavior:
- Configure SRS with dynamic forward backend:
vhost __defaultVhost__ {
forward {
enabled on;
backend http://backend-service/api/NewStream;
}
}-
Create a backend service that returns HTTP 500 error
-
Push stream using OBS:
- Configure OBS with RTMP URL:
rtmp://srs-server/live/test - Start streaming
- Configure OBS with RTMP URL:
-
Backend returns HTTP 500, stream publish fails
-
Try to push the same stream again using OBS
- Stop and restart streaming in OBS
-
See error: "stream is busy"
-
Check SRS API - stream doesn't appear:
curl http://srs-server:1985/api/v1/streams- The stream is permanently stuck and only SRS restart can recover it
Expected behavior
When backend API fails, one of these should happen:
on_unpublish()should be called to reset the publish state- There should be a timeout mechanism to auto-clear the busy state
- Backend API errors should not prevent the stream from being published (just skip forwarding)
Additional context
Root Cause Analysis (Based on Source Code)
The Issue Flow:
- Client publishes stream to SRS
- SRS calls
SrsLiveSource::on_publish() on_publish()setscan_publish_ = falsebefore callinghub_->on_publish()hub_->on_publish()→create_backend_forwarders()→on_forward_backend()calls the backend API- Backend API returns HTTP 500 → error propagates back
on_publish()returns error, butcan_publish_is alreadyfalse- Critical: If
on_unpublish()is not called to resetcan_publish_ = true, the stream is stuck
Source Code Evidence:
File: trunk/src/app/srs_app_rtmp_source.cpp
srs_error_t SrsLiveSource::on_publish()
{
...
can_publish_ = false; // ← Set to false BEFORE calling hub
// Notify the hub about the publish event.
if (hub_ && (err = hub_->on_publish()) != srs_success) {
return srs_error_wrap(err, "hub publish"); // ← Returns error if backend fails
}
...
}
void SrsLiveSource::on_unpublish()
{
...
can_publish_ = true; // ← Only place where it's reset to true
}
bool SrsLiveSource::can_publish(bool is_edge)
{
...
return can_publish_; // ← Checked by acquire_publish()
}File: trunk/src/app/srs_app_rtmp_conn.cpp
// Check whether RTMP stream is busy.
if (!source->can_publish(info_->edge_)) {
return srs_error_new(ERROR_SYSTEM_STREAM_BUSY, "rtmp: stream %s is busy", req->get_stream_url().c_str());
}Why Auto-Cleanup Doesn't Work:
bool SrsLiveSource::stream_is_dead()
{
// still publishing?
if (!can_publish_ || !publish_edge_->can_publish()) {
return false; // ← If can_publish_ is false, source is NEVER cleaned up!
}
...
}Workaround
Ensure the backend API never returns errors:
[HttpPost]
public async Task<IResult> HandleNewStream(...)
{
try
{
var response = await srsService.HandleNewStream(request);
return Results.Ok(response);
}
catch (Exception ex)
{
logger.Error(ex, "Error processing request");
// Always return HTTP 200 + Code 0, even on error
return Results.Ok(new SrsForwardResponse
{
Code = 0,
Data = new SrsForwardData { Urls = [] } // Empty URLs = no forwarding
});
}
}Proposed Fix
Option 1: Ensure on_unpublish() is Always Called
In SrsRtmpConn::publishing(), ensure release_publish() properly calls on_unpublish() even when on_publish() fails.
Option 2: Add Timeout Mechanism
Add a timestamp to track when can_publish_ was set to false, and auto-reset after a timeout (e.g., 30 seconds).
Option 3: Change Error Handling Strategy
Don't fail the entire publish when only the forward backend fails. Log the error and continue publishing without forwarding.
Related Issues
- SOURCE: Possible race condition in SrsSource publish/unpublish #742 - Race condition in publish/unpublish
- Forward: Fetch the URL to forward from a backend, dynamic forwarding, on-demand forwarding, forwarding Hook. #1342 - Dynamic forward feature
Production Impact
This issue affects production systems where:
- Backend services may experience temporary failures
- Streams need to be quickly republished after disconnection
- Automatic recovery is essential for reliability
The current behavior requires manual intervention (pod restart) which is not acceptable for high-availability systems.