Stream Stuck in "Busy" State When Forward Backend API Fails

## Describe the bug

When the forward backend API returns an HTTP 500 error (or any error), the stream becomes permanently stuck in a "busy" state and cannot be republished until SRS is restarted. This creates a "ghost busy" stream that is not visible via SRS API but prevents any new publish attempts for that stream.

## Version

- **SRS Version**: v6.0-r0 (also tested on v6.0.155)
- **Deployment**: Kubernetes cluster
- **Feature**: Dynamic Forward with Backend API

## To Reproduce

Steps to reproduce the behavior:

1. Configure SRS with dynamic forward backend:

```yaml
vhost __defaultVhost__ {
    forward {
        enabled on;
        backend http://backend-service/api/NewStream;
    }
}
```

2. Create a backend service that returns HTTP 500 error

3. Push stream using OBS:
   - Configure OBS with RTMP URL: `rtmp://srs-server/live/test`
   - Start streaming

4. Backend returns HTTP 500, stream publish fails

5. Try to push the same stream again using OBS
   - Stop and restart streaming in OBS

6. See error: "stream is busy"

7. Check SRS API - stream doesn't appear:

```bash
curl http://srs-server:1985/api/v1/streams
```

8. The stream is permanently stuck and only SRS restart can recover it

## Expected behavior

When backend API fails, one of these should happen:
- `on_unpublish()` should be called to reset the publish state
- There should be a timeout mechanism to auto-clear the busy state
- Backend API errors should not prevent the stream from being published (just skip forwarding)

## Additional context

### Root Cause Analysis (Based on Source Code)

#### The Issue Flow:
1. Client publishes stream to SRS
2. SRS calls `SrsLiveSource::on_publish()`
3. `on_publish()` sets `can_publish_ = false` **before** calling `hub_->on_publish()`
4. `hub_->on_publish()` → `create_backend_forwarders()` → `on_forward_backend()` calls the backend API
5. Backend API returns HTTP 500 → error propagates back
6. `on_publish()` returns error, but `can_publish_` is already `false`
7. **Critical**: If `on_unpublish()` is not called to reset `can_publish_ = true`, the stream is stuck

#### Source Code Evidence:

**File: `trunk/src/app/srs_app_rtmp_source.cpp`**

```cpp
srs_error_t SrsLiveSource::on_publish()
{
    ...
    can_publish_ = false;  // ← Set to false BEFORE calling hub
    
    // Notify the hub about the publish event.
    if (hub_ && (err = hub_->on_publish()) != srs_success) {
        return srs_error_wrap(err, "hub publish");  // ← Returns error if backend fails
    }
    ...
}

void SrsLiveSource::on_unpublish()
{
    ...
    can_publish_ = true;  // ← Only place where it's reset to true
}

bool SrsLiveSource::can_publish(bool is_edge)
{
    ...
    return can_publish_;  // ← Checked by acquire_publish()
}
```

**File: `trunk/src/app/srs_app_rtmp_conn.cpp`**

```cpp
// Check whether RTMP stream is busy.
if (!source->can_publish(info_->edge_)) {
    return srs_error_new(ERROR_SYSTEM_STREAM_BUSY, "rtmp: stream %s is busy", req->get_stream_url().c_str());
}
```

#### Why Auto-Cleanup Doesn't Work:

```cpp
bool SrsLiveSource::stream_is_dead()
{
    // still publishing?
    if (!can_publish_ || !publish_edge_->can_publish()) {
        return false;  // ← If can_publish_ is false, source is NEVER cleaned up!
    }
    ...
}
```

### Workaround

Ensure the backend API **never** returns errors:

```csharp
[HttpPost]
public async Task<IResult> HandleNewStream(...)
{
    try
    {
        var response = await srsService.HandleNewStream(request);
        return Results.Ok(response);
    }
    catch (Exception ex)
    {
        logger.Error(ex, "Error processing request");
        
        // Always return HTTP 200 + Code 0, even on error
        return Results.Ok(new SrsForwardResponse
        {
            Code = 0,
            Data = new SrsForwardData { Urls = [] }  // Empty URLs = no forwarding
        });
    }
}
```

### Proposed Fix

#### Option 1: Ensure on_unpublish() is Always Called

In `SrsRtmpConn::publishing()`, ensure `release_publish()` properly calls `on_unpublish()` even when `on_publish()` fails.

#### Option 2: Add Timeout Mechanism

Add a timestamp to track when `can_publish_` was set to false, and auto-reset after a timeout (e.g., 30 seconds).

#### Option 3: Change Error Handling Strategy

Don't fail the entire publish when only the forward backend fails. Log the error and continue publishing without forwarding.

### Related Issues

- #742 - Race condition in publish/unpublish
- #1342 - Dynamic forward feature

### Production Impact

This issue affects production systems where:
- Backend services may experience temporary failures
- Streams need to be quickly republished after disconnection
- Automatic recovery is essential for reliability

The current behavior requires manual intervention (pod restart) which is not acceptable for high-availability systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stream Stuck in "Busy" State When Forward Backend API Fails #4631

Describe the bug

Version

To Reproduce

Expected behavior

Additional context

Root Cause Analysis (Based on Source Code)

The Issue Flow:

Source Code Evidence:

Why Auto-Cleanup Doesn't Work:

Workaround

Proposed Fix

Option 1: Ensure on_unpublish() is Always Called

Option 2: Add Timeout Mechanism

Option 3: Change Error Handling Strategy

Related Issues

Production Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Stream Stuck in "Busy" State When Forward Backend API Fails #4631

Description

Describe the bug

Version

To Reproduce

Expected behavior

Additional context

Root Cause Analysis (Based on Source Code)

The Issue Flow:

Source Code Evidence:

Why Auto-Cleanup Doesn't Work:

Workaround

Proposed Fix

Option 1: Ensure on_unpublish() is Always Called

Option 2: Add Timeout Mechanism

Option 3: Change Error Handling Strategy

Related Issues

Production Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions