-
Notifications
You must be signed in to change notification settings - Fork 9
Description
I have three machines:
- "New" is a new heavy compute server; it is the fastest machine, with oodles of memory and cores and SSD in a fast setup
- "Old" is an aging compute server; it is very fast, with cores and memory, but has filesystem i/o issues (long story)
- "Lap" is an old laptop; it has little memory and not much CPU, but at least has an SSD
I recently tested filesystem i/o on these three machines. "New" was fastest, with "Lap" just behind; "Old" was a factor of four slower than both.
With this context, when running this command:
java -Xmx64G -jar ../minerva/minerva-cli/bin/minerva-cli.jar --import-owl-models -j /tmp/blazegraph.jnl -f ~/local/src/git/noctua-models/models/
I get something like this for the times:
- "Lap": 30min (using only 8G, as it is a small machine)
- "New": 60min
- "Old": 120min
That is unexpected. "New" should be smoking all comers. Filesystem testing indicates that that shouldn't be the problem. Looking at the processing on "New", I noticed that almost all cores were in use (~100). I'm wondering if there is degenerate utilization of CPUs or memory.
I figured out how to "hide" CPUs from java (e.g. taskset -a -c 0,1,2,3,4,5,6,7 COMMAND) and started trying to find if I could get better numbers by tuning. The answer is a baffling "yes".
CPUs Mem Runtime
8 8G 57:04.77
4 16G 52:24.90
2 32G 38:09.33
2 64G 36:15.91
1 64G 26:15.49
1 128G 26:31.99
It seems that the job is single-cpu bound and more make things worse. More memory does nothing, once there is enough.
Tagging @balhoff