-
Notifications
You must be signed in to change notification settings - Fork 112
Description
As described in some JIRA issues, the use of only the -Xms argument to the Java commands cause crashes, especially in the HaplotypeCaller_GATK4_VCF task in GermlineVariantDiscovery.wdl. This has been fixed using in #197 by supplying the -Xmx argument as well, which should be 1 GB smaller than the available memory on the machine. The available memory of the machine can differ from the value specified in the inputs though because of Cromwell's Retry with more memory feature. Therefore, we need to query the available memory. In the PR mentioned above, this has been done using the free command, however, the output format of this command can differ between container images, therefore, it is not very robust. Cromwell supplies the bash environment variables ${MEM_SIZE} and ${MEM_UNIT} for this purpose, unfortunately they are only available in the scope of the command section and not in the WDL section of the task. If we want to rely on these variables for determining the Java memory size, this code could be used (in this example, in the beginning of the command section of the HaplotypeCaller task):
# We need at least 1 GB of available memory outside of the Java heap in order to execute native code, thus, limit
# Java's memory by the total memory minus 1 GB. We need to obtain the machine memory from the MEM_SIZE and MEM_UNIT
# environment variables as it might differ from the input variable because of Cromwell's retry with more memory feature.
case ${MEM_UNIT} in
TB)
memory_exponent=2
;;
GB)
memory_exponent=1
;;
MB)
memory_exponent=0
;;
KB)
memory_exponent=-1
;;
B|Bytes)
memory_exponent=-2
;;
*)
echo Error: The MEM_UNIT environment variable has an unexpected value. >&2
exit 1
esac
available_memory_mb=$(echo "print(int(${MEM_SIZE}*(1024**(${memory_exponent}))))" | python)
let java_memory_size_mb=available_memory_mb-1024
echo Total available memory: ${available_memory_mb} MB >&2
echo Memory reserved for Java: ${java_memory_size_mb} MB >&2
gatk --java-options "-Xmx${java_memory_size_mb}m -Xms${java_memory_size_mb}m -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10" \
...
This code relies on Python 3 being installed in the container image, but it can probably be written in a way that only relies on bash (or very common Linux commands). It is not very elegant, but this is the most robust implementation.
If we want to leverage the MEM_SIZE and MEM_UNIT variables, this code should be included before every (GATK/Picard) Java command in the pipeline.