-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Describe the bug
xtb (the underlying library) does not close file descriptors after writing logs to them. This causes multiple issues with logs. If you are performing a large volume of calculations you will get an OS error when you have performed more calculations that the ulimit -n number of calclutions. ulimit is communicating the maximum number of file descriptors a process can have open.
Additionally, depending on how xtb is run, it appears xtb has some internal logic that attempts to circumvent this bug with strange behavior. xtb will only write up to ulimit - a few calculations to log files, then it stops opening file descriptors, even if calculator.set_output(f"logfile-{i}") is passed and starts dumping output to the console instead of writing output to disk. Also, if you run a calculation and then time.sleep(10) after the calculation and check the log file, you will see that it is empty--in fact a whole series of log files will be empty until the whole program exits and the OS flushes the logs to disk.
xtb needs to close the file it opens here. Calling calculator.release_output() does not fix this problem.
To Reproduce
To show logs getting dumped to console instead of written to disk and to show that logs are not written after a calculation (only flushed when the whole process exists):
Most systems set ulimit -n 1024 by default. Run this script. Calling it script.py.
import time
from pathlib import Path
import numpy as np
from xtb.interface import Calculator, Param
from xtb.libxtb import VERBOSITY_FULL
output_dir = Path("xtb-data")
output_dir.mkdir(exist_ok=True)
numbers = np.array([1, 1])
positions = np.array(
[
[0.0, 0.0, 0.0],
[0.0, 0.0, 1.4],
]
)
def calculate_xtb(i):
calc = Calculator(Param.GFN2xTB, numbers, positions)
calc.set_verbosity(VERBOSITY_FULL)
calc.set_output(str(output_dir / f"logs-{i}.txt"))
calc.singlepoint()
calc.release_output()
# time.sleep(10)
for i in range(2000):
calculate_xtb(i)
if i % 100 == 0:
print(i)python script.pyNow look at the number of log files (there should be 2000), there will only be 1021. Notice how the output started dumping to the terminal instead of being written to log files at the end. This should not happen.
ls -1 xtb-data | wc -lRemove all the files
rm -r xtb-data/*.txt Set a new ulimit and run the script again.
ulimit -n 75
python script.pyCount the log files. Note that much more output dumped to the console instead of being written to log files. Should be 2000 log files, there are only 71 but there should be 2000.
ls -1 xtb-data | wc -lxtb appears to be introspecting ulimit declarations and limiting file handle opens so it doesn't get terminated by the operating system. However, it should just close the file handles after it opens then and writes logs to them. It should flush the log writing to disk after a calculation. You can see the log flushing/writing is not happening properly by uncommenting the time.sleep(10) in scripts.py. Then look at log-0.txt after the calculation. It is empty. It will remain empty until all calculations are complete and then the OS flushes data to the log files. This is not good and is a result of xtb not closing its file handles properly after writing logs.
If you are running xtb calculations inside of a process the opens temporary directories using python, you'll get a OSError: [Errno 24] Too many open files:. Here is an alternative script that shows how xtb causes this issue:
from pathlib import Path
from tempfile import TemporaryDirectory
import numpy as np
from xtb.interface import Calculator, Param
from xtb.libxtb import VERBOSITY_FULL
from qcop.adapters.utils import tmpdir
output_dir = Path("xtb-data")
output_dir.mkdir(exist_ok=True)
numbers = np.array([1, 1])
positions = np.array(
[
[0.0, 0.0, 0.0],
[0.0, 0.0, 1.4],
]
)
def calculate_xtb(filepath):
calc = Calculator(Param.GFN2xTB, numbers, positions)
calc.set_verbosity(VERBOSITY_FULL)
calc.set_output(filepath)
calc.singlepoint()
calc.release_output()
# time.sleep(10)
outputs = []
for i in range(2000):
with TemporaryDirectory() as tmpdir:
logs = f"{tmpdir}/logs-{i}.txt"
calculate_xtb(logs)
if i % 100 == 0:
print(i)All of these issues go away if you do not .set_output(filepath) as xtb just writes output to the console.
Please provide all input and output file such that we confirm your report.-->
Expected behaviour
xtb properly opens, writes to, and closes the file passed into .set_output()