-
Notifications
You must be signed in to change notification settings - Fork 60
Description
I am using a 40 AS Mini-Internet here, four regions with 10 ASes each, 2 tier-1, 2 stub (fully configured) and 6 tier-2 ASes (managed by the students). So a "classic" setup I would suppose. Find all config files attached:
Currently the whole intra-domain stuff is done, eBGP session are configured and running, business relationships and IXPs are setup too. The connection matrix shows full connectivity, some paths are still invalid due to route leaks based on mishandled business relationships. RPKI stuff is not done at the moment.
Now I ran into some serious troubles. I observe heavy and rising load on the virtual machine (VM) running the Mini-Internet. The VM has 16 CPU cores, all are up between 95 and 100 % load, load average is between 55 and 70. Memory consumption is at around 44 GB of 64 GB in total. Here's the current output of htop on this VM:
Some deeper analysis shows that a big part of this heavy load seems to originate from the routinator processes in the 40 ASes (look at the TIME column in the ps output):
The ones with the most CPU time are the ones in the fully configured tier-1 and stub ASes. Looking at one of the affected containers (group 12, tier-1, routinator running on the host at router GRZ) shows the following for ps:
Using strace on one of the routinator process on the VM shows that, if I am right, the routinator process is spawning a lot of new processes "doing things". I attach a file with strace output here:
g18_grz_host_routinator_trace.txt
Any ideas what is going wrong here?
Thanks for your help in advance!
Markus


