Skip to content
Arthur Gilly edited this page Apr 24, 2020 · 16 revisions

Welcome to the MUMMY wiki!

Introduction

This is the wiki for the burden_testing repository, nicknamed MUMMY, which contains a software suite for running rare variant aggregation tests from next-generation sequencing data. It started out as a collection of perl scripts wrapping the MONSTER software. It was used in this configuration to run the analyses for this article and this one. This early version has been released as v1.0, available for download in the releases section. In this configuration, it is a "MONSTER wrapper", hence the nickname.

In order to maximise compatibility and ease of deployment, we have packaged the software in a singularity container. We have also added support for a second software, SMMAT, which is described in this article. Like MONSTER, SMMAT allows fitting a mixed model, thereby taking into account fine population structure and other random effects. It also has an efficient implementation that makes it orders of magnitude faster to run than MONSTER.

Value added

MUMMY simplifies running SMMAT and MONSTER, in that it takes care of all the format conversions for you. It also writes genome-wide gene set files and performs variant selection for every gene according to the selection and weighting criteria of your choosing.

Prerequisites and installation

Container version

In order to run the containerised version, the only thing you need is a recent (>v3.0) version of the Singularity software. Ask your admin to do this for you if you are not superuser. If you have a HPC environment, singularity is fully compatible, but it may need to be installed on the nodes. Installation instructions can be found on the official singularity website.

Downloading the pre-built Singularity container

The most recent stable build of the container is available through this link. We will attempt to keep this link up to date, however if you want to be sure to have the latest version, build your own container as described below.

Building the latest container

We keep a Docker container up to date on Docker Hub. In order to build this on your machine, run the following (as root):

git clone https://github.com/hmgu-itg/burden_testing/ && cd burden_testing && singularity build burden.latest Singularity_via_docker

This can be done on a machine where you have local root access, this is just to build the container. If you don't have a Linux machine with root access, you can try the --fakeroot option of Singularity, please check their docs. DO NOT try building with --remote, our container is too big for that.

Making changes to the container

For this, you will currently also require an installation of Docker on at least one machine with Singularity (this can be done on your laptop). You will also need an account on Docker Hub. In the below code, I assume your username is username.

git clone https://github.com/hmgu-itg/burden_testing/ && cd burden_testing
nano Dockerfile ## make your changes here
docker build -t username/burden_testing .
docker push username/burden_testing ## this will take a long time
sed -i 's/agilly/username/' Singularity_via_docker
sudo singularity build burden.latest Singularity_via_docker

Testing the container

Just run (no root required)

singularity exec burden_testing_latest help

More details on how the software is packaged can be found here.

Pipeline overview

** Note: ** We recommend using SMMAT even in a single-cohort setting for its computational efficiency.

Single-cohort variant aggregation test using MONSTER

1. Prepare resource files

2. Run the tests

Script version

The list of prerequisites needed is described here. [add install steps]

Single-cohort variant aggregation test and meta-analysis using SMMAT

1. Create variant lists, for every cohort

This is a simple file describing the chromosome, position and alleles of all the variants present in your cohort. It is produced by the step1 script, which is documented here.

2. Merge variant lists (meta-analysis only)

In case several cohorts are to be meta-analysed, merge variant lists into a single unified file, and apply the variant filters you think are appropriate.

3. Prepare resource files, for every cohort

These are resource files that the pipeline needs in order to perform variant selection. They are downloaded from various Internet resources and converted into a format the pipeline understands. Use the prepare_regions command for this, documented here.

4. Create set file, once across all cohorts

Taking as input your single-cohort or merged variant list, create a set file

Data input

[describe the input files needed]

Clone this wiki locally