ETCD is down and nothing is clear from logs what happened for 15-20 seconds which is failing startup probes #16141
Unanswered
rahulbapumore
asked this question in
Q&A
Replies: 3 comments 1 reply
-
And one more thing to mention is that debug logs are enabled to see what is happening between that 20 second of time, but not understanding. Thanks |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi Team, |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @rahulbapumore - I would suggest some detailed debugging following: https://etcd.io/docs/v3.5/op-guide/monitoring/#debug-endpoint You could use |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Team,
I am installing etcd cluster as statefulset having one replica. During upgrade when etcd again starts up, then etcd kind of hangs or not available for 20 seconds and it does not serve to clients and also does not print anything into the logs. After 20 seconds it starts serving other clients. We want to understand why this is happening, because of this behavior its failing startup probes in other services.
Below configuration we are using -
Git SHA: 215b53cf3 Go Version: go1.17.13 Go OS/Arch: linux/amd64 bash-4.4$ etcdctl version etcdctl version: 3.5.7 API version: 3.5 bash-4.4$
Environment variables -
bash-4.4$ env BOOTSTRAP_ENABLED=false E_SEC_KEY_MANAGEMENT_PORT_8210_TCP=tcp://10.104.152.217:8210 E_SEC_SIP_TLS_SERVICE_PORT_HTTP_METRIC_TLS=8889 VALID_PARAMETERS=valid ETCD_INITIAL_CLUSTER_TOKEN=dced TLS_ENABLED=true ETCD_MAX_SNAPSHOTS=3 CLIENT_PORTS=2379 E_SEC_SIP_TLS_SERVICE_PORT=8889 TZ=UTC HOSTNAME=dced-0 E_SEC_SIP_TLS_PORT_8889_TCP_PORT=8889 COMPONENT_VERSION=v3.5.7 HTTP_PROBE_CMD_DIR=/usr/local/bin/health HTTP_PROBE_READINESS_CMD_TIMEOUT_SEC=15 ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379 ETCD_HEARTBEAT_INTERVAL=100 ETCD_AUTO_COMPACTION_RETENTION=100 DISARM_ALARM_PEER_INTERVAL=6 NAMESPACE=zmorrah ETCD_TRUSTED_CA_FILE=/data/combinedca/cacertbundle.pem DB_THRESHOLD_PERCENTAGE=70 MONITOR_ALARM_INTERVAL=5 PEER_CERT_AUTH_ENABLED=true E_SEC_KEY_MANAGEMENT_SERVICE_HOST=10.104.152.217 E_SEC_SIP_TLS_PORT_8889_TCP_PROTO=tcp E_SEC_KEY_MANAGEMENT_PORT_8200_TCP_PROTO=tcp TRUSTED_CA=/data/combinedca/cacertbundle.pem PEER_CLIENTS_CERTS=/run/sec/certs/peer/srvcert.pem FIFO_DIR=/fifo KUBERNETES_PORT_443_TCP_PROTO=tcp ENTRYPOINT_RESTART_ETCD=true HTTP_PROBE_NAMESPACE=zmorrah KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1 E_SEC_KEY_MANAGEMENT_SERVICE_PORT_HTTPS_SHELTER=8210 ETCDCTL_CERT=/run/sec/certs/client/clicert.pem DEFRAGMENT_ENABLE=true ENTRYPOINT_DCED_PROCESS_INTERVAL=5 E_DATA_DISTRIBUTED_COORDINATOR_ED_SERVICE_HOST=10.98.158.122 ETCD_LOG_LEVEL=info ENTRYPOINT_CHECKSNUMBER=60 E_SEC_SIP_TLS_PORT=tcp://10.107.132.182:8889 KUBERNETES_PORT=tcp://10.96.0.1:443 POD_NAME=dced-0 E_DATA_DISTRIBUTED_COORDINATOR_ED_SERVICE_PORT=2379 E_SEC_SIP_TLS_SERVICE_HOST=10.107.132.182 PWD=/ ETCD_LISTEN_PEER_URLS=https://0.0.0.0:2380 HOME=/home/dced E_DATA_DISTRIBUTED_COORDINATOR_ED_SERVICE_PORT_CLIENT_PORT_TLS=2379 ETCD_AUTO_COMPACTION_MODE=revision KUBERNETES_SERVICE_PORT_HTTPS=443 E_SEC_KEY_MANAGEMENT_PORT_8210_TCP_ADDR=10.104.152.217 E_DATA_DISTRIBUTED_COORDINATOR_ED_PORT_2379_TCP_ADDR=10.98.158.122 KUBERNETES_PORT_443_TCP_PORT=443 ETCD_LOGGER=zap PEER_AUTO_TLS_ENABLED=true E_SEC_KEY_MANAGEMENT_SERVICE_PORT_HTTPS_KMS=8200 ETCD_CERT_FILE=/run/sec/certs/server/srvcert.pem ETCD_PEER_AUTO_TLS=true E_DATA_DISTRIBUTED_COORDINATOR_ED_PORT_2379_TCP_PORT=2379 KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443 E_DATA_DISTRIBUTED_COORDINATOR_ED_PORT_2379_TCP=tcp://10.98.158.122:2379 LISTEN_PEER_URLS=https://0.0.0.0:2380 DEFRAGMENT_PERIODIC_INTERVAL=60 CONTAINER_NAME=dced COMPONENT=etcd ETCD_DATA_DIR=/data ETCD_CLIENT_CERT_AUTH=true TERM=xterm E_SEC_KEY_MANAGEMENT_PORT_8210_TCP_PROTO=tcp E_SEC_KEY_MANAGEMENT_PORT=tcp://10.104.152.217:8200 ETCDCTL_ENDPOINTS=dced.zmorrah:2379 HTTP_PROBE_LIVENESS_CMD_TIMEOUT_SEC=15 ETCD_METRICS=basic PEER_CLIENT_KEY_FILE=/run/sec/certs/peer/srvprivkey.pem HTTP_PROBE_CONTAINER_NAME=dced E_SEC_SIP_TLS_PORT_8889_TCP_ADDR=10.107.132.182 GODEBUG=tls13=1 ETCDCTL_API=3 E_DATA_DISTRIBUTED_COORDINATOR_ED_PORT=tcp://10.98.158.122:2379 ETCD_SNAPSHOT_COUNT=5000 ETCD_MAX_WALS=3 SHLVL=1 E_SEC_KEY_MANAGEMENT_PORT_8200_TCP_ADDR=10.104.152.217 HTTP_PROBE_POD_NAME=dced-0 KUBERNETES_SERVICE_PORT=443 ETCD_INITIAL_ADVERTISE_PEER_URLS=https://dced-0.dced-peer.zmorrah.svc.cluster.local:2380 HTTP_PROBE_STARTUP_CMD_TIMEOUT_SEC=15 E_SEC_KEY_MANAGEMENT_PORT_8210_TCP_PORT=8210 ETCD_KEY_FILE=/run/sec/certs/server/srvprivkey.pem ETCD_ELECTION_TIMEOUT=1000 HTTP_PROBE_SERVICE_NAME=dced ETCDCTL_CACERT=/data/combinedca/cacertbundle.pem ETCD_NAME=dced-0 ETCD_QUOTA_BACKEND_BYTES=268435456 E_SEC_SIP_TLS_PORT_8889_TCP=tcp://10.107.132.182:8889 ENTRYPOINT_PIPE_TIMEOUT=5 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin ETCD_ADVERTISE_CLIENT_URLS=https://dced-0.dced.zmorrah:2379 DCED_PORT=2379 E_SEC_KEY_MANAGEMENT_SERVICE_PORT=8200 KUBERNETES_SERVICE_HOST=10.96.0.1 FLAVOUR=etcd-v3.5.7-linux-amd64 E_SEC_KEY_MANAGEMENT_PORT_8200_TCP=tcp://10.104.152.217:8200 E_DATA_DISTRIBUTED_COORDINATOR_ED_PORT_2379_TCP_PROTO=tcp E_SEC_KEY_MANAGEMENT_PORT_8200_TCP_PORT=8200 ETCDCTL_KEY=/run/sec/certs/client/cliprivkey.pem _=/usr/bin/env bash-4.4$
Logs are attached below -
logs_311.txt
Inside the logs , if you check between timestamps 2023-06-20T03:00:45.770+00:00 and 2023-06-20T03:01:04.420+00:00, nothing is printed and this is the reason of 20 seconds delay. Can you help us in understanding this scenario?
{"message":"Error: context deadline exceeded","metadata":{"container_name":"dced","namespace":"beets","pod_name":"dced-0"},"service_id":"dced","severity":"error","timestamp":"2023-06-20T03:00:45.770+00:00","version":"1.2.0"} {"caller":"v3rpc/interceptor.go:182","message":"request stats","metadata":{"container_name":"dced","namespace":"beets","pod_name":"dced-0"},"remote":"10.1.0.247:54582","request content":"","request count":-1,"request size":-1,"response count":-1,"response size":-1,"response type":"/etcdserverpb.Maintenance/Status","service_id":"dced","severity":"debug","start time":"2023-06-20T03:01:04.420Z","time spent":"61.157µs","timestamp":"2023-06-20T03:01:04.420+00:00","version":"1.2.0"}
And after 2023-06-20T03:01:04.420+00:00, everything is working fine.
Please help us in understanding this behavior.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions