-
Notifications
You must be signed in to change notification settings - Fork 328
Description
Running Janus with basic auth using cassandra as the persistence mechanism: Automated scripted deployment of Janus sporadically comes up in a bad state. This bad state is that connections to the admin port are accepted but they block until the client times out (no response is ever sent to the client). Requests through the api gateway port seem to be working properly.
When the system comes up in this state, it never recovers. The only way that I can get it working is to undeploy Janus and to redeploy it.
Not having the admin port available prevents the loading of basic user credentials.
Frequency: No hard numbers here but estimating it fails once every five to six deployments.
Possible cause: Looking at the logs, I see a timeout on accessing cassandra. Does not appear to ever retry the cassandra request.
Janus log when in bad state:
➜ kubectl logs janus-deployment-6bfccd676-v7qgd -c janus
time="2021-04-08T14:09:44Z" level=info msg="Janus starting..." version=dev-9fa15f6
[StatsGo] 2021/04/08 14:09:44 Stats counter incremented metric=app.init.janus-deployment-6bfccd676-v7qgd.janus
[StatsGo] 2021/04/08 14:09:44 Stats counter incremented metric=total.app
[StatsGo] 2021/04/08 14:09:52 Stats counter incremented metric=error-log.error.-.-
[StatsGo] 2021/04/08 14:09:52 Stats counter incremented metric=total.error-log
{"level":"error","msg":"error getting all definitions: gocql: no response received from cassandra within timeout period","time":"2021-04-08T14:09:52Z"}
Janus log when the admin port works correctly:
➜ kubectl logs janus-deployment-6bfccd676-qssw9 -c janus
time="2021-04-08T14:57:24Z" level=info msg="Janus starting..." version=dev-9fa15f6
[StatsGo] 2021/04/08 14:57:24 Stats counter incremented metric=app.init.janus-deployment-6bfccd676-qssw9.janus
[StatsGo] 2021/04/08 14:57:24 Stats counter incremented metric=total.app