-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Description
In the process of debugging the memory leaks, it was noticed that tests were repeatedly failing between the mock core service and the authz library. There were multiple symptoms but the failing behavior was sporadic. What made this clear that it was not a networking problem was that all communication was located on localhost.
ZeroMQ handling retries in the application logic, but we were finding that this was not helping, rather the exact same errors were occuring on retries, We then thought that it might have to do with the ZeroMQ context because creating a new context on failure slightly decreased the failure rate.
On increasing the log output it was discovered that the one of the inproc zeromq sockets was creating clients before binding the address for the server. This was identified as being the problem. The documentation states that servers should be bound first, it is confusing that sometimes it was working when binding the clients first.
According to this thread it shouldn't matter if connect is run first. https://stackoverflow.com/questions/30883859/how-does-zeromq-connect-and-bind-work-internally, However, I think the difference is that we are using INPROC. This is noted elsewhere: zeromq/netmq#165
This is explicit in the documentation here:
The inter-thread transport, inproc, is a connected signaling transport. It is much faster than tcp or ipc. This transport has a specific limitation compared to tcp and ipc: the server must issue a bind before any client issues a connect. This was fixed in ZeroMQ v4.0 and later versions.
on this page. https://zguide.zeromq.org/docs/chapter2/
Below you see tcp dump output
For successful test
17:43:31.079481 IP localhost.55276 > localhost.9998: Flags [S], seq 404506562, win 65495, options [mss 65495,sackOK,TS val 172639509 ecr 0,nop,wscale 7], length 0
17:43:31.079497 IP localhost.9998 > localhost.55276: Flags [S.], seq 1269143289, ack 404506563, win 65483, options [mss 65495,sackOK,TS val 172639509 ecr 172639509,nop,wscale 7], length 0
17:43:31.079511 IP localhost.55276 > localhost.9998: Flags [.], ack 1, win 512, options [nop,nop,TS val 172639509 ecr 172639509], length 0
17:43:31.079649 IP localhost.55276 > localhost.9998: Flags [P.], seq 1:11, ack 1, win 512, options [nop,nop,TS val 172639509 ecr 172639509], length 10
17:43:31.079656 IP localhost.9998 > localhost.55276: Flags [P.], seq 1:11, ack 1, win 512, options [nop,nop,TS val 172639509 ecr 172639509], length 10
17:43:31.079716 IP localhost.55276 > localhost.9998: Flags [.], ack 11, win 512, options [nop,nop,TS val 172639509 ecr 172639509], length 0
17:43:31.079721 IP localhost.9998 > localhost.55276: Flags [.], ack 11, win 512, options [nop,nop,TS val 172639509 ecr 172639509], length 0
17:43:31.079727 IP localhost.55276 > localhost.9998: Flags [P.], seq 11:12, ack 11, win 512, options [nop,nop,TS val 172639509 ecr 172639509], length 1
17:43:31.079731 IP localhost.9998 > localhost.55276: Flags [.], ack 12, win 512, options [nop,nop,TS val 172639509 ecr 172639509], length 0
17:43:31.079738 IP localhost.9998 > localhost.55276: Flags [P.], seq 11:65, ack 12, win 512, options [nop,nop,TS val 172639509 ecr 172639509], length 54
17:43:31.079912 IP localhost.55276 > localhost.9998: Flags [P.], seq 12:65, ack 65, win 512, options [nop,nop,TS val 172639510 ecr 172639509], length 53
17:43:31.080029 IP localhost.55276 > localhost.9998: Flags [P.], seq 65:267, ack 65, win 512, options [nop,nop,TS val 172639510 ecr 172639509], length 202
17:43:31.080047 IP localhost.9998 > localhost.55276: Flags [.], ack 267, win 512, options [nop,nop,TS val 172639510 ecr 172639510], length 0
17:43:31.080153 IP localhost.9998 > localhost.55276: Flags [P.], seq 65:235, ack 267, win 512, options [nop,nop,TS val 172639510 ecr 172639510], length 170
17:43:31.080386 IP localhost.55276 > localhost.9998: Flags [P.], seq 267:597, ack 235, win 512, options [nop,nop,TS val 172639510 ecr 172639510], length 330
17:43:31.080854 IP localhost.9998 > localhost.55276: Flags [P.], seq 235:337, ack 597, win 512, options [nop,nop,TS val 172639511 ecr 172639510], length 102
17:43:31.080977 IP localhost.55276 > localhost.9998: Flags [P.], seq 597:1023, ack 337, win 512, options [nop,nop,TS val 172639511 ecr 172639511], length 426
17:43:31.128996 IP localhost.9998 > localhost.55276: Flags [.], ack 1023, win 512, options [nop,nop,TS val 172639559 ecr 172639511], length 0
17:43:31.278917 IP localhost.9998 > localhost.55276: Flags [P.], seq 337:747, ack 1023, win 512, options [nop,nop,TS val 172639709 ecr 172639511], length 410
17:43:31.280267 IP localhost.55276 > localhost.9998: Flags [F.], seq 1023, ack 747, win 512, options [nop,nop,TS val 172639710 ecr 172639709], length 0
17:43:31.280369 IP localhost.9998 > localhost.55276: Flags [F.], seq 747, ack 1024, win 512, options [nop,nop,TS val 172639710 ecr 172639710], length 0
17:43:31.280388 IP localhost.55276 > localhost.9998: Flags [.], ack 748, win 512, options [nop,nop,TS val 172639710 ecr 172639710], length 0
For failing test seeing this output from tcpdump
17:45:12.220887 IP localhost.57054 > localhost.9998: Flags [P.], seq 12:65, ack 12, win 512, options [nop,nop,TS val 172740657 ecr 172740657], length 53
17:45:12.220888 IP localhost.9998 > localhost.57054: Flags [P.], seq 12:65, ack 12, win 512, options [nop,nop,TS val 172740657 ecr 172740657], length 53
17:45:12.221014 IP localhost.57054 > localhost.9998: Flags [P.], seq 65:267, ack 65, win 512, options [nop,nop,TS val 172740657 ecr 172740657], length 202
17:45:12.221035 IP localhost.9998 > localhost.57054: Flags [.], ack 267, win 512, options [nop,nop,TS val 172740657 ecr 172740657], length 0
17:45:12.221215 IP localhost.9998 > localhost.57054: Flags [P.], seq 65:235, ack 267, win 512, options [nop,nop,TS val 172740658 ecr 172740657], length 170
17:45:12.221706 IP localhost.57054 > localhost.9998: Flags [P.], seq 267:597, ack 235, win 512, options [nop,nop,TS val 172740658 ecr 172740658], length 330
17:45:12.221904 IP localhost.9998 > localhost.57054: Flags [P.], seq 235:337, ack 597, win 512, options [nop,nop,TS val 172740658 ecr 172740658], length 102
17:45:12.222019 IP localhost.57054 > localhost.9998: Flags [P.], seq 597:1023, ack 337, win 512, options [nop,nop,TS val 172740658 ecr 172740658], length 426
17:45:12.262188 IP localhost.9998 > localhost.57054: Flags [.], ack 1023, win 512, options [nop,nop,TS val 172740699 ecr 172740658], length 0
17:45:16.224033 IP localhost.57054 > localhost.9998: Flags [F.], seq 1023, ack 337, win 512, options [nop,nop,TS val 172744660 ecr 172740699], length 0
17:45:16.224124 IP localhost.9998 > localhost.57054: Flags [F.], seq 337, ack 1024, win 512, options [nop,nop,TS val 172744661 ecr 172744660], length 0
17:45:16.224139 IP localhost.57054 > localhost.9998: Flags [.], ack 338, win 512, options [nop,nop,TS val 172744661 ecr 172744661], length 0
Notice that there is a missing message
Steps to Replicate
Additional context
Page occured: