Skip to content

[Problem opening file: '-fold'] --> angst-wrapper SFS ./Site_Frequency_Spectrum_Config in online tutorial #11

@squisquater

Description

@squisquater

I followed the installation instructions and the tutorial instructions but am running into an error when I try to run the Site Frequency Spectrum.

angsd-wrapper SFS ./Site_Frequency_Spectrum_Config

This is my output and it appears it's failing when trying to find the file needed to fold (or not fold) the spectrum, but it can't.

WRAPPER: Zipping advanced arguments onto basic ones

        -> angsd version: 0.911-44-g1c0ebb6 (htslib: 1.3.1-30-gbb03b02) build(Oct 31 2021 11:04:52)
        -> Reading fasta: /mnt/steelhead/remote/Sophie/Programs/angsd-wrapper/Example_Data/Sequences/Tripsacum_TDD39103.fa
        -> Reading fasta: /mnt/steelhead/remote/Sophie/Programs/angsd-wrapper/Example_Data/Sequences/Zea_mays.AGPv3.30.dna_sm.chromosome.10.fa
        -> (Using Filipe G Vieira modification of: abcSaf.cpp)
        -> Parsing 11 number of samples
        -> Region lookup 1/1

        -> We have now allocated approximately 10 Megabytes of raw nodes to the nodepool
        -> Printing at chr: 10 pos:17551496 chunknumber 1100
        -> We have now allocated approximately 20 Megabytes of raw nodes to the nodepool
        -> Printing at chr: 10 pos:19386992 chunknumber 2000 [emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=11
        -> Printing at chr: 10 pos:22395913 chunknumber 3200 [emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=10
[emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=10
[emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=10
        -> Printing at chr: 10 pos:24004662 chunknumber 3600 [emFrequency_F] caught nan will not exit
logLike (3*nInd). nInd=11
keepList (nInd)
used logLike (3*length(keep))=11
        -> Printing at chr: 10 pos:24908040 chunknumber 4000
        -> Done reading data waiting for calculations to finish
        -> Done waiting for threads

        -> npools:26 unfreed tnodes before clean:0
        -> Output filenames:
                ->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.arg"
                ->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.mafs.gz"
                ->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.geno.gz"
                ->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.gz"
                ->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.pos.gz"
                ->"/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.idx"
        -> Sun Oct 31 12:08:56 2021
        -> Arguments and parameters for all analysis are located in .arg file
        [ALL done] cpu-time used =  199.08 sec
        [ALL done] walltime used =  130.00 sec
        -> Version of fname:/mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.idx is:2
        -> Assuming .saf.gz file: /mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.gz
        -> Assuming .saf.pos.gz: /mnt/steelhead/remote/Sophie/scratch/Maize/SFS/Maize_SFSOut.saf.pos.gz
        -> Problem opening file: '-fold'

Looking at the wrapper shell script (Site_Frequency_Spectrum.sh) it appears that is failing in the final section of the script in the middle of a series of pipes to the final file which does get output in my scratch directory, it's just empty.

#!/usr/bin/env bash

set -e
set -o pipefail

#   Load variables from supplied config file
source "$1"

#   Are we using Common_Config? If so, source it
if [[ -f "${COMMON}" ]]
then
    source "${COMMON}"
fi

#   Where is angsd-wrapper located?
SOURCE=$2

#   Where is ANGSD?
ANGSD_DIR=${SOURCE}/dependencies/angsd

#   Variables created from transforming other variables
#       The number of individuals in the taxon we are analyzing
N_IND=$(wc -l < "${SAMPLE_LIST}")
#       How many inbreeding coefficients are supplied?
N_F=$(wc -l < "${SAMPLE_INBREEDING}")
#       For ANGSD, the actual sample size is twice the number of individuals, since each individual has two chromosomes.
#       The individual inbreeding coefficents take care of the mismatch between these two numbers

#   Perform a check to see if number of individuals matches number of inbreeding coefficients
if [ "${N_IND}" -ne "${N_F}" ]
then
    echo "Mismatch between number of samples in ${SAMPLE_LIST} and ${SAMPLE_INBREEDING}"
    exit 1
fi

#   Check to see if ancestral state is supplied: If not, polarize samples using
#   the reference sequence and generate folded saf.
if [ ! -f "${ANC_SEQ}" ]
then
    echo "Ancestral state data not found, using reference sequence to polarize alignment data. BAQ will likewise not be calculated."
    if [ ! -f "${REF_SEQ}" ]
    then
        echo "No reference sequence supplied, unable to perform calculations."
        exit 2
    else
        ANC_SEQ=$REF_SEQ
        REF_SEQ=
        BAQ=0
        FOLD=1
    fi
else
    FOLD=0
fi

#   Create outdirectory
OUT="${SCRATCH}"/"${PROJECT}"/SFS
mkdir -p "${OUT}"

#   Now we actually run the command, this creates a binary file that contains the prior SFS
if [[ -f "${OUT}"/"${PROJECT}"_SFSOut.mafs.gz ]] && [ "$OVERRIDE" = "false" ]
then
    echo "WRAPPER:maf already exists and OVERRIDE=false, skipping angsd -bam..."
else
    #   Do we have a regions file?
    if [[ -f "${REGIONS}" ]]
    then
	WRAPPER_ARGS=$(echo -bam "${SAMPLE_LIST}" \
            -out "${OUT}"/"${PROJECT}"_SFSOut \
            -indF "${SAMPLE_INBREEDING}" \
            -doSaf "${DO_SAF}" \
            -uniqueOnly "${UNIQUE_ONLY}" \
            -anc "${ANC_SEQ}" \
            -minMapQ "${MIN_MAPQ}" \
            -minQ "${MIN_BASEQUAL}" \
            -nInd "${N_IND}" \
            -minInd "${MIN_IND}"\
            -baq "${BAQ}" \
            -ref "${REF_SEQ}" \
            -GL "${GT_LIKELIHOOD}" \
            -P "${N_CORES}" \
            -doMajorMinor "${DO_MAJORMINOR}" \
            -doMaf "${DO_MAF}" \
            -doGeno "${DO_GENO}" \
            -rf "${REGIONS}" \
            -doPost "${DO_POST}")
    #   Are we missing a definiton for regions?
    elif [[ -z "${REGIONS}" ]]
    then
	WRAPPER_ARGS=$(echo -bam "${SAMPLE_LIST}" \
            -out "${OUT}"/"${PROJECT}"_SFSOut \
            -indF "${SAMPLE_INBREEDING}" \
            -doSaf "${DO_SAF}" \
            -uniqueOnly "${UNIQUE_ONLY}" \
            -anc "${ANC_SEQ}" \
            -minMapQ "${MIN_MAPQ}" \
            -minQ "${MIN_BASEQUAL}" \
            -nInd "${N_IND}" \
            -minInd "${MIN_IND}"\
            -baq "${BAQ}" \
            -ref "${REF_SEQ}" \
            -GL "${GT_LIKELIHOOD}" \
            -P "${N_CORES}" \
            -doMajorMinor "${DO_MAJORMINOR}" \
            -doMaf "${DO_MAF}" \
            -doGeno "${DO_GENO}" \
            -doPost "${DO_POST}")
    #   Assuming a single region was defined in config file
    else
	WRAPPER_ARGS=$(echo -bam "${SAMPLE_LIST}" \
            -out "${OUT}"/"${PROJECT}"_SFSOut \
            -indF "${SAMPLE_INBREEDING}" \
            -doSaf "${DO_SAF}" \
            -uniqueOnly "${UNIQUE_ONLY}" \
            -anc "${ANC_SEQ}" \
            -folded "${FOLD}" \
            -minMapQ "${MIN_MAPQ}" \
            -minQ "${MIN_BASEQUAL}" \
            -nInd "${N_IND}" \
            -minInd "${MIN_IND}" \
            -baq "${BAQ}" \
            -ref "${REF_SEQ}" \
            -GL "${GT_LIKELIHOOD}" \
            -P "${N_CORES}" \
            -doMajorMinor "${DO_MAJORMINOR}" \
            -doMaf "${DO_MAF}" \
            -doGeno "${DO_GENO}" \
            -doPost "${DO_POST}" \
            -r "${REGIONS}")
    fi
fi
# Check for advanced arguments, and overwrite any overlapping definitions
FINAL_ARGS=($(source "${SOURCE}/Wrappers/Arg_Zipper.sh" "${WRAPPER_ARGS}" "${ADVANCED_ARGS}"))
# DEBUGGING
# echo "Wrapper arguments: ${WRAPPER_ARGS}" 1<&2
# echo -e "Final arguments:" ${FINAL_ARGS} 1<&2

"${ANGSD_DIR}"/angsd "${FINAL_ARGS[@]}"

"${ANGSD_DIR}"/misc/realSFS \
    "${OUT}"/"${PROJECT}"_SFSOut.saf.idx \
    -P "${N_CORES}" \
    -fold "${FOLD}" \
    > "${OUT}"/"${PROJECT}"_DerivedSFS.graph.me`

I can also include my configuration file if helpful (Site_Frequency_Spectrum_Config) which also directs the script to another configuration file in the same directory (Common_Config), but I'm wondering whether anyone else has run into this error while trying to move through this tutorial before. I am trying to figure out if this is a file path issue or if the SFS is not running correctly and there is some other error in the output file I am not identifying correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions