Skip to content

SAIGE gives extreme p-values while other software don't #195

@olivierlabayle

Description

@olivierlabayle

Hi,

I have been trying to use SAIGE recently and the results I obtain are somewhat surprising; the resulting p-values are extreme across the entire genome. The surprising fact is that other software like regenie and plink2 yield different and rather concordant results. I am thinking perhaps there is something wrong with how I use SAIGE, though I have been following the tutorials in the documentation. Here are the steps I follow as well as all the things I have tried; all yielding the same results:

  1. Prepare plink bed file: LD pruning (I have also tried without pruning)
  2. Create sparse GRM (I have also tried without)
  3. Fit the null model with the sparse GRM (or without, I have also tried with and without the further 1000 SNPs extract)
  4. Use a pgen file to perform association testing

I would appreciate any feedback as well as would like to know if anyone has face similar issues.

Comparison With plink2 and REGENIE

SAIGE

Image

REGENIE

Image

PLINK2

Image

Environment

  • UKB RAP
  • I have tried both the docker pull wzhou88/saige:1.5.0 environment and an installation from source using the more recent 3add05fc3f80309f8e4fab980e9a410f87cacc8f commit and both yield the same results.

Code

The code is below:

genotypes_prefix=EUR.SEVERE_COVID_19.ldpruned
sample_list=gwas.individuals.EUR.SEVERE_COVID_19.txt
covariates_file=EUR.SEVERE_COVID_19.merged_covariates_and_pcs.tsv
mac=10
maf=0.005
group_name=EUR.SEVERE_COVID_19
npcs="20"
covariates_list="AGE,SEX,AGE_x_AGE,AGE_x_SEX"

# Restrict BED file to sample individuals and reasonable SNPs
plink2 \
    --bfile ${genotypes_prefix} \
    --keep ${sample_list} \
    --mac ${mac} \
    --maf ${maf} \
    --min-alleles 2 \
    --max-alleles 2 \
    --make-bed \
    --out ${genotypes_prefix}.biallelic_frequent

# phenotype from group_name
phenotype=$(echo ${group_name} | cut -d'.' -f2)

# Make covariates list
pc_list=$(printf "CHR0_OUT_PC%s," {1..20} | sed 's/,$//')
full_covariates_list="$covariates_list,${pc_list}"

# Find the type of the phenotype (quantitative or binary)
phenotype_col_idx=$(head -1 ${covariates_file} | tr '\t' '\n' | grep -n ${phenotype} | cut -d: -f1)
uniq_vals_count=$(cut -f"${phenotype_col_idx}" ${covariates_file} | sort -u | grep -v "NA" | wc -l)
trait_type="binary"
if [ "${uniq_vals_count}" -gt 3 ]; then # two binary values + header
    trait_type="quantitative --invNormalize=TRUE"
fi

# Create Sparse GRM
createSparseGRM.R \
    --plinkFile=${genotypes_prefix}.biallelic_frequent \
    --nThreads=$(nproc) \
    --outputPrefix=${group_name} \
    --numRandomMarkerforSparseKin=2000 \
    --relatednessCutoff=0.125

## Make sparser plink file
shuf -n 1000 ${genotypes_prefix}.biallelic_frequent.bim | awk '{print $2}' > ${genotypes_prefix}.biallelic_frequent.1000Markers.list
plink2 \
    --bfile ${genotypes_prefix}.biallelic_frequent \
    --extract ${genotypes_prefix}.biallelic_frequent.1000Markers.list \
    --make-bed \
    --out ${genotypes_prefix}.biallelic_frequent.1000Markers.list

step1_fitNULLGLMM.R     \
    --plinkFile=${genotypes_prefix}.biallelic_frequent.1000Markers.list \
    --useSparseGRMtoFitNULL=TRUE \
    --sparseGRMFile=EUR.SEVERE_COVID_19_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx \
    --sparseGRMSampleIDFile=EUR.SEVERE_COVID_19_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \
    --phenoFile=${covariates_file} \
    --phenoCol=${phenotype} \
    --covarColList=$covariates_list \
    --sampleIDColinphenoFile=IID \
    --traitType=${trait_type} \
    --outputPrefix=${group_name} \
    --nThreads=$(nproc)	\
    --IsOverwriteVarianceRatioFile=TRUE

input_prefix=genomicc_ukb.merged.imputed.chr_22
variance_ratio_file=EUR.SEVERE_COVID_19.varianceRatio.txt
model_file=EUR.SEVERE_COVID_19.rda
chr=22

plink2 \
    --pfile ${input_prefix} \
    --keep ${sample_list} \
    --max-alleles 2 \
    --min-alleles 2 \
    --rm-dup exclude-all list \
    --make-pgen \
    --out ${input_prefix}.biallelic

step2_SPAtests.R \
    --pgenPrefix=${input_prefix}.biallelic \
    --AlleleOrder=ref-first \
    --chrom=${chr} \
    --minMAF=0 \
    --minMAC=${mac} \
    --GMMATmodelFile=${model_file} \
    --varianceRatioFile=${variance_ratio_file} \
    --is_Firth_beta=TRUE \
    --pCutoffforFirth=0.05 \
    --is_output_moreDetails=TRUE \
    --LOCO=FALSE \
    --SAIGEOutputFile=${group_name}.chr${chr}_${phenotype}.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions