|
1 | | -# gnomad_python_api |
2 | | -🧬 That gnomAD Python API script can be used to retrieve data from gnomAD "genome aggregation database". |
| 1 | +# gnomAD Python API (Batch Script) |
| 2 | + |
| 3 | +## :hash: What is *gnomAD* and the purpose of this script? |
| 4 | +[gnomAD (The Genome Aggregation Database)](http://gnomad.broadinstitute.org/) is aggregation of thousands of exomes and genomes human sequencing studies. Also, gnomAD consortium annotates the variants with allelic frequency in genomes and exomes. |
| 5 | +**Here**, this batch script is able to search the genes or transcripts of your interest and retrieve variant data from the database via [gnomAD backend API](https://gnomad.broadinstitute.org/api) that based on GraphQL query language. |
| 6 | + |
| 7 | +## :hash: Requirements and Installation |
| 8 | + - Create a directory and download the "**gnomad_python_api.py**" and "**requirements.txt**" files or clone the repository via Git using following command: |
| 9 | + `git clone https://github.com/furkanmtorun/gnomad_python_api.git` |
| 10 | + |
| 11 | + - Install the required packages if you do not already: |
| 12 | +` pip3 install -r requirements.txt ` |
| 13 | + |
| 14 | +- It's ready to use now! |
| 15 | + |
| 16 | +> If you did not install **pip** yet, please follow the instruction [here](https://pip.pypa.io/en/stable/installing/). |
| 17 | +
|
| 18 | +## :hash: Usage & Options |
| 19 | +| Options in the script | Description | Parameters | |
| 20 | +|--|--|--| |
| 21 | +| -filter_by | *It defines the input type* |gene_name, gene_id, transcript_id | |
| 22 | +| -search_by | *It defines the input* | Type a gene/transcript identifier <br> *e.g.: TP53, ENSG00000169174, ENST00000544455* <br> Type the name of file containig your inputs <br> *e.g: myGenes.txt* |
| 23 | +| -dataset | *It defines the dataset* | exac, gnomad_r2_1, gnomad_r3, gnomad_r2_1_controls, gnomad_r2_1_non_neuro, gnomad_r2_1_non_cancer, gnomad_r2_1_non_topmed |
| 24 | +| -h | It displays the parameters | *To get help via script:* `python gnomad_python_api.py -h` |
| 25 | + |
| 26 | +### Example Usages |
| 27 | +- **How to list the variants by gene name or gene id?** |
| 28 | +`python gnomad_python_api.py -filter_by="gene_name" -search_by="TP53" -dataset="gnomad_r2_1"` |
| 29 | + |
| 30 | +> Here, "**gene_id**" can also be used instead of "**gene_name**" after stating an **Ensembl Gene ID** instead of a gene name. |
| 31 | +
|
| 32 | +- **How to list the variants by transcripts?** |
| 33 | +`python gnomad_python_api.py -filter_by="transcript_id" -search_by="ENST00000544455" -dataset="gnomad_r3"` |
| 34 | + |
| 35 | +- **How to list the variants by using a file containing genes/transcripts?** |
| 36 | + |
| 37 | + - Prepare your file that contains gene name, Ensembl gene IDs or Ensembl transcript IDs line-by-line. |
| 38 | + > ENSG00000169174 <br> ENSG00000171862 <br> ENSG00000170445 |
| 39 | + |
| 40 | + - Then, run the following command: |
| 41 | + |
| 42 | + `python gnomad_python_api.py -filter_by="gene_id" -search_by="myFavoriteGenes.txt" -dataset="exac"` |
| 43 | + |
| 44 | +> Please, use only one type of identifier in the file. |
| 45 | +
|
| 46 | +- Then, the variants will be listed in "**outputs**" folder in the files according to their identifier (gene name, gene id or transcript id). |
| 47 | +- That's all! |
| 48 | + |
| 49 | +## :hash: Contributing & Feedback |
| 50 | +I would be very happy to see any feedbacks and contributions on the script. |
| 51 | + |
| 52 | +**Furkan Torun | [[email protected]](mailto:[email protected]) | Web site: [furkanmtorun.github.io ](https://furkanmtorun.github.io/)** |
| 53 | + |
| 54 | + |
| 55 | + |
0 commit comments