-
Notifications
You must be signed in to change notification settings - Fork 24
Suddenly 504 Gateway Time-out after half year without any problems #14
Description
We have a setup with about 30 users, calendar integration and a few gigabytes data running for about half an year now without any problems, but suddenly the container is very slow and in 95% of times responds with a 504 Gateway Time-Out.
It seems that it for some reason is totally busy with something like when there would be a ddos attack (what is not, I blocked all requests in the LB to verify). Sometimes it even restarts the ECS task because the health checks also run into timeouts.
The only thing noticeable in the logs are a lot of entries like this:
{"reqId":"R88qYRLYFO3NKB6IneRT","level":3,"time":"2022-03-04T15:53:23+00:00","remoteAddr":"10.192.10.76","user":"xxxx","app":"PHP","method":"PUT","url":"/remote.php/dav/files/xxxx/abcd.csv","message":{"Exception":"Error","Message":"fclose(): supplied resource is not a valid stream resource at /var/www/html/3rdparty/icewind/streams/src/Wrapper.php#96","Code":0,"Trace":[{"function":"onError","class":"OC\\Log\\ErrorHandler","type":"::","args":[2,"fclose(): supplied resource is not a valid stream resource","/var/www/html/3rdparty/icewind/streams/src/Wrapper.php",96,[]]},{"file":"/var/www/html/3rdparty/icewind/streams/src/Wrapper.php","line":96,"function":"fclose","args":[null]},{"file":"/var/www/html/3rdparty/icewind/streams/src/CallbackWrapper.php","line":117,"function":"stream_close","class":"Icewind\\Streams\\Wrapper","type":"->","args":[]},{"function":"stream_close","class":"Icewind\\Streams\\CallbackWrapper","type":"->","args":[]},{"file":"/var/www/html/3rdparty/guzzlehttp/psr7/src/Stream.php","line":108,"function":"fclose","args":[null]},{"file":"/var/www/html/3rdparty/guzzlehttp/psr7/src/Stream.php","line":74,"function":"close","class":"GuzzleHttp\\Psr7\\Stream","type":"->","args":[]},{"function":"__destruct","class":"GuzzleHttp\\Psr7\\Stream","type":"->","args":[]}],"File":"/var/www/html/lib/private/Log/ErrorHandler.php","Line":92,"CustomMessage":"--"},"userAgent":"Mozilla/5.0 (Macintosh) mirall/3.3.2git (build 7106) (Nextcloud, osx-21.3.0 ClientArchitecture: x86_64 OsArchitecture: x86_64)","version":"21.0.1.1"}
What I already tried:
- Increasing task computing power to 8gb memory and 4vcpu (before 2gb/1vcpu)
- Spawning 8 parallel instances (normally only 1 running)
- Blocking all requests in LB
- Deleting 100k small files
- Restarting Redis
Database load seems normal.
I don't know what else I can do. Is it somehow possible to connect via ssh to the Nextcloud instance? Or do you have any other suggestions? What could be the problem here?