-
-
Notifications
You must be signed in to change notification settings - Fork 92
Open
Description
Here are my notes regarding the periodic instability issues being encountered by trss..
I've attached the log excerpt, I've used to investigate...
Around 18/12/2025 05:37 failures started occurring when accessing data-backed pages in the AQA Test Tools service. The webpage remained up, but no data was returned.
Symptoms:
- Some pages (for example
/output/test,/deepHistory,/testPerPlatform) initially loaded. - Data-heavy sections failed shortly after page load.
- Browsers received 502 Bad Gateway errors for multiple API calls.
- The site appeared unstable or partially broken rather than fully offline.
What the Logs Show
Nginx (Frontend / Reverse Proxy)
- Nginx was running normally and accepting client requests.
- Requests to frontend routes returned HTTP 200 with very low latency (0–1 ms).
- Requests to backend API endpoints (
/api/*) consistently failed.
Typical error:
connect() failed (111: Connection refused) while connecting to upstream
Nginx returned:
502 Bad Gateway
The upstream configured for these requests was:
http://172.21.0.5:3001/
Root Cause
The backend API service listening on port 3001 was not running or not accepting connections.
Key points:
- The IP address was reachable, but the TCP connection was refused.
- This means nothing was listening on port 3001 at the time.
- Nginx itself was healthy and behaving correctly.
- The issue was not caused by nginx, the browser, or general networking.
This strongly indicates that the API container or process:
- Crashed
- Exited due to an unhandled error
- Was killed (for example OOM kill)
- Restarted and failed to come back up
Likely Contributing Factors
Based on surrounding context and earlier MongoDB errors:
- The API service likely depends on MongoDB.
- A MongoDB connectivity issue may have caused the API process to exit.
- The API did not recover automatically, leaving nginx pointing at a dead upstream.
Impact
- All
/api/*endpoints were unavailable. - Pages relying on live API data failed.
- Users experienced intermittent or broken functionality.
- MongoDB logged client disconnects as a downstream effect.
Metadata
Metadata
Assignees
Labels
No labels