-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
Hi,
I have been trying to use SAIGE recently and the results I obtain are somewhat surprising; the resulting p-values are extreme across the entire genome. The surprising fact is that other software like regenie and plink2 yield different and rather concordant results. I am thinking perhaps there is something wrong with how I use SAIGE, though I have been following the tutorials in the documentation. Here are the steps I follow as well as all the things I have tried; all yielding the same results:
- Prepare plink bed file: LD pruning (I have also tried without pruning)
- Create sparse GRM (I have also tried without)
- Fit the null model with the sparse GRM (or without, I have also tried with and without the further 1000 SNPs extract)
- Use a pgen file to perform association testing
I would appreciate any feedback as well as would like to know if anyone has face similar issues.
Comparison With plink2 and REGENIE
SAIGE
REGENIE
PLINK2
Environment
- UKB RAP
- I have tried both the
docker pull wzhou88/saige:1.5.0environment and an installation from source using the more recent3add05fc3f80309f8e4fab980e9a410f87cacc8fcommit and both yield the same results.
Code
The code is below:
genotypes_prefix=EUR.SEVERE_COVID_19.ldpruned
sample_list=gwas.individuals.EUR.SEVERE_COVID_19.txt
covariates_file=EUR.SEVERE_COVID_19.merged_covariates_and_pcs.tsv
mac=10
maf=0.005
group_name=EUR.SEVERE_COVID_19
npcs="20"
covariates_list="AGE,SEX,AGE_x_AGE,AGE_x_SEX"
# Restrict BED file to sample individuals and reasonable SNPs
plink2 \
--bfile ${genotypes_prefix} \
--keep ${sample_list} \
--mac ${mac} \
--maf ${maf} \
--min-alleles 2 \
--max-alleles 2 \
--make-bed \
--out ${genotypes_prefix}.biallelic_frequent
# phenotype from group_name
phenotype=$(echo ${group_name} | cut -d'.' -f2)
# Make covariates list
pc_list=$(printf "CHR0_OUT_PC%s," {1..20} | sed 's/,$//')
full_covariates_list="$covariates_list,${pc_list}"
# Find the type of the phenotype (quantitative or binary)
phenotype_col_idx=$(head -1 ${covariates_file} | tr '\t' '\n' | grep -n ${phenotype} | cut -d: -f1)
uniq_vals_count=$(cut -f"${phenotype_col_idx}" ${covariates_file} | sort -u | grep -v "NA" | wc -l)
trait_type="binary"
if [ "${uniq_vals_count}" -gt 3 ]; then # two binary values + header
trait_type="quantitative --invNormalize=TRUE"
fi
# Create Sparse GRM
createSparseGRM.R \
--plinkFile=${genotypes_prefix}.biallelic_frequent \
--nThreads=$(nproc) \
--outputPrefix=${group_name} \
--numRandomMarkerforSparseKin=2000 \
--relatednessCutoff=0.125
## Make sparser plink file
shuf -n 1000 ${genotypes_prefix}.biallelic_frequent.bim | awk '{print $2}' > ${genotypes_prefix}.biallelic_frequent.1000Markers.list
plink2 \
--bfile ${genotypes_prefix}.biallelic_frequent \
--extract ${genotypes_prefix}.biallelic_frequent.1000Markers.list \
--make-bed \
--out ${genotypes_prefix}.biallelic_frequent.1000Markers.list
step1_fitNULLGLMM.R \
--plinkFile=${genotypes_prefix}.biallelic_frequent.1000Markers.list \
--useSparseGRMtoFitNULL=TRUE \
--sparseGRMFile=EUR.SEVERE_COVID_19_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx \
--sparseGRMSampleIDFile=EUR.SEVERE_COVID_19_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \
--phenoFile=${covariates_file} \
--phenoCol=${phenotype} \
--covarColList=$covariates_list \
--sampleIDColinphenoFile=IID \
--traitType=${trait_type} \
--outputPrefix=${group_name} \
--nThreads=$(nproc) \
--IsOverwriteVarianceRatioFile=TRUE
input_prefix=genomicc_ukb.merged.imputed.chr_22
variance_ratio_file=EUR.SEVERE_COVID_19.varianceRatio.txt
model_file=EUR.SEVERE_COVID_19.rda
chr=22
plink2 \
--pfile ${input_prefix} \
--keep ${sample_list} \
--max-alleles 2 \
--min-alleles 2 \
--rm-dup exclude-all list \
--make-pgen \
--out ${input_prefix}.biallelic
step2_SPAtests.R \
--pgenPrefix=${input_prefix}.biallelic \
--AlleleOrder=ref-first \
--chrom=${chr} \
--minMAF=0 \
--minMAC=${mac} \
--GMMATmodelFile=${model_file} \
--varianceRatioFile=${variance_ratio_file} \
--is_Firth_beta=TRUE \
--pCutoffforFirth=0.05 \
--is_output_moreDetails=TRUE \
--LOCO=FALSE \
--SAIGEOutputFile=${group_name}.chr${chr}_${phenotype}.txtReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels