- Create a new Docker container using our Installation Guide or start an existing container with:
docker start -i CONTAINERNAME- Navigate to the directory where the permGWAS2 repository is located:
cd /REPO_DIRECTORY/permGWAS- Run the script with the test data provided in the
./datafolder:
python3 permGWAS.py -x ./data/x_matrix.h5 -y ./data/y_matrix.csv To use permGWAS2 without Docker, simply omit the first step.
Details on the supported data types can be found in the Data Guide.
- The minimal requirement is to provide a genotype and a phenotype file (as relative or absolute paths) via the
flags
-xand-y, respectively. - By default, permGWAS assumes that the phenotype in the phenotype file is called
phenotype_value. You can specify a different name via the flag-trait:
python3 permGWAS.py -x PATH_TO_GENOTYPE -y PATH_TO_PHENOTYPE -trait PHENO_NAME- It is possible to run permGWAS2 for several phenotypes located in the same phenotype file one after another. You can
either specify a list of phenotypes or run permGWAS2 for all available phenotypes in the file by using the key word
all:
python3 permGWAS.py -x PATH_TO_GENOTYPE -y PATH_TO_PHENOTYPE -trait PHENO_1 PHENO_2 PHENO_3
python3 permGWAS.py -x PATH_TO_GENOTYPE -y PATH_TO_PHENOTYPE -trait allBy default, permGWAS2 computes the realized relationship kernel as kinship matrix. You can use a pre-computed genomic
relationship matrix via the flag -k:
python3 permGWAS.py -x PATH_TO_GENOTYPE -y PATH_TO_PHENOTYPE -k PATH_TO_KINSHIPIt is possible to run permGWAS2 with additional covariates. To specify the covariate file, use the flag cov.
By default, this uses all available covariates in the file. If you only want to use certain columns/covariates, you
have to use the flag -cov_list and specify the covariate names as a list:
python3 permGWAS.py -x PATH_TO_GENOTYPE -y PATH_TO_PHENOTYPE -cov PATH_TO_COVARIATE_FILE
python3 permGWAS.py -x PATH_TO_GENOTYPE -y PATH_TO_PHENOTYPE -cov PATH_TO_COVARIATE_FILE -cov_list COV_1 COV_2 COV_3permGWAS2 accepts yaml config files where you can specify all flags and options instead of passing them all separately:
python3 permGWAS.py -config ./data/config.yaml The config file should have the following structure:
---
genotype_file: "PATH_TO_GENOTYPE"
phenotype_file: "PATH_TO_PHENOTYPE"
trait: "PHENO_NAME"
kinship_file: "PATH_TO_KINSHIP"
covariate_file: "PATH_TO_COVARIATE_FILE"
covariate_list:
- "COV_1"
- "COV_2"
- "COV_3"Per default permGWAS2 creates a CSV output file and saves it in a directory called results. You can also specify a
different directory for the output files via the flag -out_dir. The output file will be saved under the name
p_values_NAME.csv, where NAME will be the phenotype name by default, but can also be changed via -out_file.
python3 permGWAS.py -x PATH_TO_GENOTYPE -y PATH_TO_PHENOTYPE -out_dir RESULT_FILE_DIR -out_file RESULT_FILE_NAMEThe result file contains for each analyzed SNP:
- CHR: chromosome number
- POS: position within chromosome
- p_value: computed p-value
- test_stat: computed test statistic
- maf: minor allele frequency of SNP
- SE: standard error
- effect_size: coefficient beta
Additionally, a TXT file with summary statistics will be saved. This file contains the estimates of the variance components of the null model, the narrow-sense heritability, the Bonferroni threshold and, if activated, the permutation-based threshold.
The table below shows all available flags. For detailed explanations of further flags and options go to permGWAS2 with permutations, Create plots and Optional settings.
| flag | description |
|---|---|
| -x (--genotype_file) | absolute or relative path to genotype file |
| -y (--phenotype_file) | absolute or relative path to phenotype file |
| -trait (--y_name) | name of phenotype (column) to be used in phenotype file, optional, default is "phenotype_value" |
| -k (-kinship_file) | absolute or relative path to kinship file, optional |
| -cov (--covariate_file) | absolute or relative path to covariates file, optional |
| -cov_list (--covariate_list) | names of covariates to use from covariate_file, optional |
| -maf (--maf_threshold) | minor allele frequency threshold as percentage value, optional, default is 0 |
| -load_genotype | choose whether to load full genotype from file or batch-wise during computations, optional, default is False |
| -config (--config_file) | full path to yaml config file |
| -model | specify model name, only relevant if you define your own models, currently only lmm is available |
| -out_dir | name of the directory result-files should be stored in, optional, if not provided, files will be stored in folder "results" in current directory |
| -out_file | NAME of result files, will be stored as NAME_p_values and NAME_min_p_values, optional, if not provided name of phenotype will be used |
| -disable_gpu | use if you want to perform computations on CPU only though GPU would be available |
| -device | GPU device to be used, optional, default is 0 |
| -perm | number of permutations to be performed, optional, default is 0 |
| -perm_method | method to use for permutations: y - permute only y, x - permute y and kinship matrix, default is x |
| -adj_p_value | additionally compute permutation-based adjusted p-values and store them in the p-value file, optional default is False |
| -batch (--batch_size) | number of SNPs to work on simultaneously, optional, default is 50000 |
| -batch_perm (--perm_batch_size) | number of SNPs to work on simultaneously while using permutations, optional, default is 1000 |
| -mplot (--plot, --manhattan) | creates Manhattan plot, optional |
| -qqplot | creates QQ-plot, optional |
| -not_add | use when genotype is not in additive encoding |