Skip to content

Specify Xmx and Xms values according to Cromwell's ${MEM_SIZE} and ${MEM_UNIT} variables #481

@michaelgatzen

Description

@michaelgatzen

As described in some JIRA issues, the use of only the -Xms argument to the Java commands cause crashes, especially in the HaplotypeCaller_GATK4_VCF task in GermlineVariantDiscovery.wdl. This has been fixed using in #197 by supplying the -Xmx argument as well, which should be 1 GB smaller than the available memory on the machine. The available memory of the machine can differ from the value specified in the inputs though because of Cromwell's Retry with more memory feature. Therefore, we need to query the available memory. In the PR mentioned above, this has been done using the free command, however, the output format of this command can differ between container images, therefore, it is not very robust. Cromwell supplies the bash environment variables ${MEM_SIZE} and ${MEM_UNIT} for this purpose, unfortunately they are only available in the scope of the command section and not in the WDL section of the task. If we want to rely on these variables for determining the Java memory size, this code could be used (in this example, in the beginning of the command section of the HaplotypeCaller task):

    # We need at least 1 GB of available memory outside of the Java heap in order to execute native code, thus, limit
    # Java's memory by the total memory minus 1 GB. We need to obtain the machine memory from the MEM_SIZE and MEM_UNIT
    # environment variables as it might differ from the input variable because of Cromwell's retry with more memory feature.
    case ${MEM_UNIT} in
      TB)
        memory_exponent=2
        ;;
      GB)
        memory_exponent=1
        ;;
      MB)
        memory_exponent=0
        ;;
      KB)
        memory_exponent=-1
        ;;
      B|Bytes)
        memory_exponent=-2
        ;;
      *)
        echo Error: The MEM_UNIT environment variable has an unexpected value. >&2
        exit 1
    esac

    available_memory_mb=$(echo "print(int(${MEM_SIZE}*(1024**(${memory_exponent}))))" | python)
    let java_memory_size_mb=available_memory_mb-1024
    echo Total available memory: ${available_memory_mb} MB >&2
    echo Memory reserved for Java: ${java_memory_size_mb} MB >&2

    gatk --java-options "-Xmx${java_memory_size_mb}m -Xms${java_memory_size_mb}m -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10" \
...

This code relies on Python 3 being installed in the container image, but it can probably be written in a way that only relies on bash (or very common Linux commands). It is not very elegant, but this is the most robust implementation.

If we want to leverage the MEM_SIZE and MEM_UNIT variables, this code should be included before every (GATK/Picard) Java command in the pipeline.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions