|
1 | | -# 🧬 gnomAD Python API (Batch Script) |
| 1 | +# 🧬 gnomAD Python API |
2 | 2 |
|
3 | 3 |  |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +- [🧬 gnomAD Python API](#-gnomad-python-api) |
| 8 | + - [:hash: What is *gnomAD* and the purpose of this script?](#hash-what-is-gnomad-and-the-purpose-of-this-script) |
| 9 | + - [:hash: Requirements and Installation](#hash-requirements-and-installation) |
| 10 | + - [:hash: GUI | Usage](#hash-gui--usage) |
| 11 | + - [:hash: CLI | Usage & Options](#hash-cli--usage--options) |
| 12 | + - [:hash: CLI | Example Usages](#hash-cli--example-usages) |
| 13 | + - [:hash: Disclaimer](#hash-disclaimer) |
| 14 | + - [:hash: Contributing & Feedback](#hash-contributing--feedback) |
| 15 | + - [:hash: Citation](#hash-citation) |
| 16 | + - [:hash: Developer](#hash-developer) |
| 17 | + - [:hash: References](#hash-references) |
4 | 18 |
|
5 | 19 | ## :hash: What is *gnomAD* and the purpose of this script? |
6 | | -[gnomAD (The Genome Aggregation Database)](http://gnomad.broadinstitute.org/) is aggregation of thousands of exomes and genomes human sequencing studies. Also, gnomAD consortium annotates the variants with allelic frequency in genomes and exomes. |
7 | | -**Here**, this batch script is able to search the genes or transcripts of your interest and retrieve variant data from the database via [gnomAD backend API](https://gnomad.broadinstitute.org/api) that based on GraphQL query language. |
| 20 | +[gnomAD (The Genome Aggregation Database)](http://gnomad.broadinstitute.org/) [[1]](#hash-references) is aggregation of thousands of exomes and genomes human sequencing studies. Also, gnomAD consortium annotates the variants with allelic frequency in genomes and exomes. |
| 21 | + |
| 22 | +**Here**, this API with both CLI and GUI versions is able to search the genes or transcripts of your interest and retrieve variant data from the database via [gnomAD backend API](https://gnomad.broadinstitute.org/api) that based on GraphQL query language. |
8 | 23 |
|
9 | 24 | ## :hash: Requirements and Installation |
10 | | - - Create a directory and download the "**gnomad_python_api.py**" and "**requirements.txt**" files or clone the repository via Git using following command: |
| 25 | + - Create a directory and download the "**gnomad_api_cli.py**" and "**requirements.txt**" files or clone the repository via Git using following command: |
11 | 26 |
|
12 | 27 | `git clone https://github.com/furkanmtorun/gnomad_python_api.git` |
13 | 28 |
|
14 | 29 | - Install the required packages if you do not already: |
15 | 30 |
|
16 | | - ` pip3 install -r requirements.txt ` |
| 31 | + ` pip3 install -r requirements.txt` |
| 32 | + |
| 33 | + > The `requirements.txt` contains required libraries for both GUI (graphical user interface) and CLI (command-line interface) versions. |
17 | 34 |
|
18 | 35 | - It's ready to use now! |
19 | 36 |
|
20 | 37 | > If you did not install **pip** yet, please follow the instruction [here](https://pip.pypa.io/en/stable/installing/). |
21 | 38 |
|
22 | | -## :hash: Usage & Options |
23 | | -| Options in the script | Description | Parameters | |
| 39 | +## :hash: GUI | Usage |
| 40 | + |
| 41 | +In the GUI version of gnomAD Python API, [Streamlit](https://www.streamlit.io/) has been used. |
| 42 | + |
| 43 | +> **Note:** In GUI version, it is possible to generate plots from the data retrieved. |
| 44 | +> This option is not available in CLI version since it is still under development. |
| 45 | +> |
| 46 | +> **So, it is recommended to use GUI version.** |
| 47 | +
|
| 48 | +- To use GUI version of gnomAD Python API: |
| 49 | + |
| 50 | + `streamlit run gnomad_api_gui.py` |
| 51 | + |
| 52 | + |
| 53 | +- Here are the screenshots for the GUI version: |
| 54 | + |
| 55 | +  |
| 56 | + |
| 57 | + _gnomAD Python API GUI - Main Screen_ |
| 58 | + |
| 59 | +  |
| 60 | + |
| 61 | + _gnomAD Python API GUI - Outputs_ |
| 62 | + |
| 63 | +  |
| 64 | + |
| 65 | + _gnomAD Python API GUI - Outputs and Plots_ |
| 66 | + |
| 67 | +> The outputs are also saved into `outputs/` folder in the GUI version. |
| 68 | +
|
| 69 | +## :hash: CLI | Usage & Options |
| 70 | +| Options | Description | Parameters | |
24 | 71 | |--|--|--| |
25 | | -| -filter_by | *It defines the input type* |gene_name, gene_id, transcript_id | |
26 | | -| -search_by | *It defines the input* | Type a gene/transcript identifier <br> *e.g.: TP53, ENSG00000169174, ENST00000544455* <br> Type the name of file containig your inputs <br> *e.g: myGenes.txt* |
27 | | -| -dataset | *It defines the dataset* | exac, gnomad_r2_1, gnomad_r3, gnomad_r2_1_controls, gnomad_r2_1_non_neuro, gnomad_r2_1_non_cancer, gnomad_r2_1_non_topmed |
28 | | -| -h | It displays the parameters | *To get help via script:* `python gnomad_python_api.py -h` |
| 72 | +| -filter_by | *It defines the input type.* |`gene_name`, `gene_id`, `transcript_id`, or `rs_id` | |
| 73 | +| -search_by | *It defines the input.* | Type a gene/transcript identifier <br> *e.g.: TP53, ENSG00000169174, ENST00000544455* <br> Type the name of file containig your inputs <br> *e.g: myGenes.txt* |
| 74 | +| -dataset | *It defines the dataset.* | `exac`, `gnomad_r2_1`, `gnomad_r3`, `gnomad_r2_1_controls`, `gnomad_r2_1_non_neuro`, `gnomad_r2_1_non_cancer`, or `gnomad_r2_1_non_topmed` |
| 75 | +| -sv_dataset | *It defines structural variants dataset.* | `gnomad_sv_r2_1`, `gnomad_sv_r2_1_controls`, or `gnomad_sv_r2_1_non_neuro` |
| 76 | +| -reference_genome | *It defines reference genome build.* | `GRCh37` or `GRCh38` |
| 77 | +| -h | *It displays the parameters.* | *To get help via script:* `python gnomad_api_cli.py -h` |
| 78 | + |
29 | 79 |
|
30 | | -## :hash: Example Usages |
| 80 | +> ❗ Here, for getting variants, `gnomad_r2_1` and `gnomad_sv_r2_1` are defined as default values for these two `-dataset` and `-sv_dataset` options, respectively. |
| 81 | +> |
| 82 | +> |
| 83 | +> ❗ Also, you need to choose `GRCh38` for retrieving variants from the `gnomad_r3` dataset. However, in the `GRCh38` build, structural variants are not available. |
| 84 | +
|
| 85 | +## :hash: CLI | Example Usages |
31 | 86 | - **How to list the variants by gene name or gene id?** |
32 | 87 |
|
33 | | -`python gnomad_python_api.py -filter_by="gene_name" -search_by="TP53" -dataset="gnomad_r2_1"` |
| 88 | + *For gene name:* |
| 89 | + |
| 90 | + `python gnomad_api_cli.py -filter_by=gene_name -search_by="BRCA1" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"` |
34 | 91 |
|
35 | | -> Here, "**gene_id**" can also be used instead of "**gene_name**" after stating an **Ensembl Gene ID** instead of a gene name. |
| 92 | + If you get data from `gnomad_r3`: |
| 93 | + |
| 94 | + `python gnomad_api_cli.py -filter_by=gene_name -search_by="BRCA1" -dataset="gnomad_r3" -reference_genome="GRCh38"` |
| 95 | + |
| 96 | + *For Ensembl gene ID* |
| 97 | + |
| 98 | + `python gnomad_api_cli.py -filter_by=gene_id -search_by="ENSG00000169174" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"` |
36 | 99 |
|
37 | 100 | - **How to list the variants by transcript ID?** |
38 | 101 |
|
39 | | -`python gnomad_python_api.py -filter_by="transcript_id" -search_by="ENST00000544455" -dataset="gnomad_r3"` |
| 102 | + `python gnomad_api_cli.py -filter_by=transcript_id -search_by="ENST00000407236" -dataset="gnomad_r2_1"` |
| 103 | + |
| 104 | +- **How to get variant info by RS ID (rsId)?** |
| 105 | + |
| 106 | + `python gnomad_api_cli.py -filter_by=rs_id -search_by="rs201857604" -dataset="gnomad_r2_1"` |
40 | 107 |
|
41 | 108 | - **How to list the variants using a file containing genes/transcripts?** |
42 | 109 |
|
43 | | - - Prepare your file that contains gene name, Ensembl gene IDs or Ensembl transcript IDs line-by-line. |
| 110 | + - Prepare your file that contains gene name, Ensembl gene IDs, Ensembl transcript IDs or RS IDs line-by-line. |
44 | 111 | > ENSG00000169174 <br> ENSG00000171862 <br> ENSG00000170445 |
45 | 112 |
|
46 | 113 | - Then, run the following command: |
47 | 114 |
|
48 | | - `python gnomad_python_api.py -filter_by="gene_id" -search_by="myFavoriteGenes.txt" -dataset="exac"` |
| 115 | + `python gnomad_api_cli.py -filter_by="gene_id" -search_by="myFavoriteGenes.txt" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"` |
49 | 116 |
|
50 | | -> Please, use only one type of identifier in the file. |
| 117 | + > Please, use only one type of identifier in the file. |
51 | 118 |
|
52 | | -- Then, the variants will be listed in "**outputs**" folder in the files according to their identifier (gene name, gene id or transcript id). |
| 119 | +- Then, the variants will be listed in "**outputs**" folder in the folders according to their identifier (gene name, gene id, transcript id or rsId). |
| 120 | + |
53 | 121 | - That's all! |
54 | 122 |
|
| 123 | +## :hash: Disclaimer |
| 124 | +All the outputs provided by this tool are for informational purposes only. |
| 125 | + |
| 126 | +The information is not intended to replace any consultation, diagnosis, and/or medical treatment offered by physicians or healthcare providers. |
| 127 | + |
| 128 | +The author of the app will not be liable for any direct, indirect, consequential, special, exemplary, or other damages arising therefrom. |
| 129 | + |
55 | 130 | ## :hash: Contributing & Feedback |
56 | | -I would be very happy to see any feedbacks and contributions on the script. |
| 131 | +I would be very happy to see any feedback or contributions to the project. |
| 132 | + |
| 133 | +For problems and enhancement requests, please `open an issue` above. |
| 134 | + |
| 135 | +## :hash: Citation |
| 136 | +Upcoming ! |
57 | 137 |
|
58 | | -**Furkan Torun | [[email protected]](mailto:[email protected]) | Website: [furkanmtorun.github.io ](https://furkanmtorun.github.io/)** |
| 138 | +## :hash: Developer |
| 139 | +**Furkan M. Torun ( [@furkanmtorun](http://github.com/furkanmtorun)) | [[email protected]](mailto:[email protected]) | |
| 140 | +Academia: [Google Scholar Profile](https://scholar.google.com/citations?user=d5ZyOZ4AAAAJ)** |
59 | 141 |
|
| 142 | +## :hash: References |
| 143 | +1. Karczewski, K.J., Francioli, L.C., Tiao, G. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). https://doi.org/10.1038/s41586-020-2308-7 |
60 | 144 |
|
61 | 145 |
|
0 commit comments