Skip to content

TRSS Instability Issues #1146

@steelhead31

Description

@steelhead31

Here are my notes regarding the periodic instability issues being encountered by trss..

trss_log_excerpt.txt

I've attached the log excerpt, I've used to investigate...

Around 18/12/2025 05:37 failures started occurring when accessing data-backed pages in the AQA Test Tools service. The webpage remained up, but no data was returned.

Symptoms:

  • Some pages (for example /output/test, /deepHistory, /testPerPlatform) initially loaded.
  • Data-heavy sections failed shortly after page load.
  • Browsers received 502 Bad Gateway errors for multiple API calls.
  • The site appeared unstable or partially broken rather than fully offline.

What the Logs Show

Nginx (Frontend / Reverse Proxy)

  • Nginx was running normally and accepting client requests.
  • Requests to frontend routes returned HTTP 200 with very low latency (0–1 ms).
  • Requests to backend API endpoints (/api/*) consistently failed.

Typical error:

connect() failed (111: Connection refused) while connecting to upstream

Nginx returned:

502 Bad Gateway

The upstream configured for these requests was:

http://172.21.0.5:3001/

Root Cause

The backend API service listening on port 3001 was not running or not accepting connections.

Key points:

  • The IP address was reachable, but the TCP connection was refused.
  • This means nothing was listening on port 3001 at the time.
  • Nginx itself was healthy and behaving correctly.
  • The issue was not caused by nginx, the browser, or general networking.

This strongly indicates that the API container or process:

  • Crashed
  • Exited due to an unhandled error
  • Was killed (for example OOM kill)
  • Restarted and failed to come back up

Likely Contributing Factors

Based on surrounding context and earlier MongoDB errors:

  • The API service likely depends on MongoDB.
  • A MongoDB connectivity issue may have caused the API process to exit.
  • The API did not recover automatically, leaving nginx pointing at a dead upstream.

Impact

  • All /api/* endpoints were unavailable.
  • Pages relying on live API data failed.
  • Users experienced intermittent or broken functionality.
  • MongoDB logged client disconnects as a downstream effect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions