-
Version2.4.0 Installation MethodSecurity Onion ISO image Descriptionother (please provide detail below) Installation TypeStandalone Locationon-prem with Internet access Hardware SpecsMeets minimum requirements CPU4 RAM16GB Storage for /163GB Storage for /nsm327GB Network Traffic Collectionspan port Network Traffic Speeds1Gbps to 10Gbps StatusYes, all services on all nodes are running OK Salt StatusNo, there are no failures LogsNo, there are no additional clues DetailI've been running Security Onion 2.4 for several months now and everything has been fine until a few weeks ago when whatever job manages the PCAP space stopped cleaning up old files. Prior to this issue the free space on /nsm partition would stay at 60%. Now PCAPs will completely fill up the disk space necessitating a manual purge of PCAPs. I changed the diskfreepercentage from 10 to 25 under Administration –> Configuration –> pcap –> config –> diskfreepercentage and synced the grid, but the free disk space will drop all the way to 0%. I've looked through the stenographer log and don't see any errors. I've tried looking through some of the other logs in /opt/so/log but nothing stands out as a problem. Currently /nsm is at 88% used, Filesystem Size Used Avail Use% Mounted on Here's the output of sudo du -csh * from /nsm Guidelines
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 11 replies
-
What is the latest contents of /opt/so/logs/so-sensor-clean, and does its timestamp match the current date/time? Cron should run that every minute for zeek, strelka, and suricata logs, but pcap (stenographer) and elasticsearch (curator) should be cleaning themselves separately. |
Beta Was this translation helpful? Give feedback.
-
@petiepooo - Thanks for taking the time to respond and trying to help. So I ended up rebooting my Security Onion instance this morning to see if that would fix anything and within a few minutes after the reboot, something kicked in and removed 85GB of PCAP files. So the problem is clearly with PCAP clean up and it appears that something happens that causes the stenographer clean up job to fail. Looking through the stenographer log at the time of the reboot, I can see it clean up all the pcap index files in /nsm/pcapindex for the PCAPs I manually deleted, but I don't see any log entries for the deletion of the 85GB of PCAP files in /nsm/pcap. I will continue to monitor the situation, If this happens again, is there anything I can check within stenographer to see if I can locate the problem or check the health of stenographer beyond running so-status? |
Beta Was this translation helpful? Give feedback.
-
Yes, but not with the -9. Find the pid of runuser and run FWIW, the -9 option for kill tells the kernel to remove the process from its process table. It's the "nuke from high orbit" option, and I don't recommend it except as a last ditch effort. It's more polite to send a normal -TERM (-15, the default) signal to the process first and "ask" it to clean up and terminate itself rather than yank the rug out from under it with a -KILL (-9)... The -STOP (-19) to runuser, BTW, tells the kernel to stop scheduling the process, which is why it hangs around in the ps output. If it were to run again, say by sending -CONT (-18), it would see its child the (defunct) shell has exited and would exit itself, eventually stopping the entire container. We don't want that in this case as we want to manually restart the processes in the existing container. I was testing in the 2.3.280 so-steno image. Killall is there but must not be in your newer 2.4 image... oh well. Thanks for sticking with the debugging effort. |
Beta Was this translation helpful? Give feedback.
-
I thought I'd post an update to this. I was never able to figure out what was causing the problem and ended up reinstalling Security Onion (2.4.50) rather than have /nsm fill up every 3 - 4 days. It's been running for over a week now and PCAP cleanup is working as expected. Sometimes it's just easier to nuke the site from orbit ;). I wanted to thank @petiepooo for all his help. |
Beta Was this translation helpful? Give feedback.
I thought I'd post an update to this. I was never able to figure out what was causing the problem and ended up reinstalling Security Onion (2.4.50) rather than have /nsm fill up every 3 - 4 days. It's been running for over a week now and PCAP cleanup is working as expected. Sometimes it's just easier to nuke the site from orbit ;). I wanted to thank @petiepooo for all his help.