Skip to content

Commit be18ec2

Browse files
authored
New version with the changes (PR #4)
New version with the changes
2 parents 9204030 + 9742bcb commit be18ec2

File tree

11 files changed

+1853
-142
lines changed

11 files changed

+1853
-142
lines changed

.github/workflows/actions.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ jobs:
55
runs-on: ubuntu-latest
66
strategy:
77
matrix:
8-
python-version: [3.5, 3.6, 3.7, 3.8]
8+
python-version: [3.6, 3.7, 3.8]
99
steps:
1010
- uses: actions/checkout@v2
1111
- name: Set up Python ${{ matrix.python-version }}
@@ -19,4 +19,4 @@ jobs:
1919
- name: Test a single transcript
2020
run: |
2121
# Test the script by retrieving a transcript data
22-
python gnomad_python_api.py -filter_by="gene_name" -search_by="TP53" -dataset="gnomad_r2_1"
22+
python gnomad_api_cli.py -filter_by=gene_name -search_by="BRCA1" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.ipynb_checkpoints
2+
outputs/
3+
outputs/*

README.md

Lines changed: 105 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,145 @@
1-
# 🧬 gnomAD Python API (Batch Script)
1+
# 🧬 gnomAD Python API
22

33
![Actions for gnomad_python_api](https://github.com/furkanmtorun/gnomad_python_api/workflows/Actions%20for%20gnomad_python_api/badge.svg)
4+
![Python Badges](https://img.shields.io/badge/Tested_with_Python-3.6%20%7C%203.7%20%7C%203.8-blue)
5+
![gnomAD Python API License](https://img.shields.io/badge/License-%20GPL--3.0-green)
6+
7+
- [🧬 gnomAD Python API](#-gnomad-python-api)
8+
- [:hash: What is *gnomAD* and the purpose of this script?](#hash-what-is-gnomad-and-the-purpose-of-this-script)
9+
- [:hash: Requirements and Installation](#hash-requirements-and-installation)
10+
- [:hash: GUI | Usage](#hash-gui--usage)
11+
- [:hash: CLI | Usage & Options](#hash-cli--usage--options)
12+
- [:hash: CLI | Example Usages](#hash-cli--example-usages)
13+
- [:hash: Disclaimer](#hash-disclaimer)
14+
- [:hash: Contributing & Feedback](#hash-contributing--feedback)
15+
- [:hash: Citation](#hash-citation)
16+
- [:hash: Developer](#hash-developer)
17+
- [:hash: References](#hash-references)
418

519
## :hash: What is *gnomAD* and the purpose of this script?
6-
[gnomAD (The Genome Aggregation Database)](http://gnomad.broadinstitute.org/) is aggregation of thousands of exomes and genomes human sequencing studies. Also, gnomAD consortium annotates the variants with allelic frequency in genomes and exomes.
7-
**Here**, this batch script is able to search the genes or transcripts of your interest and retrieve variant data from the database via [gnomAD backend API](https://gnomad.broadinstitute.org/api) that based on GraphQL query language.
20+
[gnomAD (The Genome Aggregation Database)](http://gnomad.broadinstitute.org/) [[1]](#hash-references) is aggregation of thousands of exomes and genomes human sequencing studies. Also, gnomAD consortium annotates the variants with allelic frequency in genomes and exomes.
21+
22+
**Here**, this API with both CLI and GUI versions is able to search the genes or transcripts of your interest and retrieve variant data from the database via [gnomAD backend API](https://gnomad.broadinstitute.org/api) that based on GraphQL query language.
823

924
## :hash: Requirements and Installation
10-
- Create a directory and download the "**gnomad_python_api.py**" and "**requirements.txt**" files or clone the repository via Git using following command:
25+
- Create a directory and download the "**gnomad_api_cli.py**" and "**requirements.txt**" files or clone the repository via Git using following command:
1126

1227
`git clone https://github.com/furkanmtorun/gnomad_python_api.git`
1328

1429
- Install the required packages if you do not already:
1530

16-
` pip3 install -r requirements.txt `
31+
` pip3 install -r requirements.txt`
32+
33+
> The `requirements.txt` contains required libraries for both GUI (graphical user interface) and CLI (command-line interface) versions.
1734
1835
- It's ready to use now!
1936

2037
> If you did not install **pip** yet, please follow the instruction [here](https://pip.pypa.io/en/stable/installing/).
2138
22-
## :hash: Usage & Options
23-
| Options in the script | Description | Parameters |
39+
## :hash: GUI | Usage
40+
41+
In the GUI version of gnomAD Python API, [Streamlit](https://www.streamlit.io/) has been used.
42+
43+
> **Note:** In GUI version, it is possible to generate plots from the data retrieved.
44+
> This option is not available in CLI version since it is still under development.
45+
>
46+
> **So, it is recommended to use GUI version.**
47+
48+
- To use GUI version of gnomAD Python API:
49+
50+
`streamlit run gnomad_api_gui.py`
51+
52+
53+
- Here are the screenshots for the GUI version:
54+
55+
![gnomAD Python API GUI](img/main_screen.png)
56+
57+
_gnomAD Python API GUI - Main Screen_
58+
59+
![gnomAD Python API GUI](img/results.png)
60+
61+
_gnomAD Python API GUI - Outputs_
62+
63+
![gnomAD Python API GUI](img/results_2.png)
64+
65+
_gnomAD Python API GUI - Outputs and Plots_
66+
67+
> The outputs are also saved into `outputs/` folder in the GUI version.
68+
69+
## :hash: CLI | Usage & Options
70+
| Options | Description | Parameters |
2471
|--|--|--|
25-
| -filter_by | *It defines the input type* |gene_name, gene_id, transcript_id |
26-
| -search_by | *It defines the input* | Type a gene/transcript identifier <br> *e.g.: TP53, ENSG00000169174, ENST00000544455* <br> Type the name of file containig your inputs <br> *e.g: myGenes.txt*
27-
| -dataset | *It defines the dataset* | exac, gnomad_r2_1, gnomad_r3, gnomad_r2_1_controls, gnomad_r2_1_non_neuro, gnomad_r2_1_non_cancer, gnomad_r2_1_non_topmed
28-
| -h | It displays the parameters | *To get help via script:* `python gnomad_python_api.py -h`
72+
| -filter_by | *It defines the input type.* |`gene_name`, `gene_id`, `transcript_id`, or `rs_id` |
73+
| -search_by | *It defines the input.* | Type a gene/transcript identifier <br> *e.g.: TP53, ENSG00000169174, ENST00000544455* <br> Type the name of file containig your inputs <br> *e.g: myGenes.txt*
74+
| -dataset | *It defines the dataset.* | `exac`, `gnomad_r2_1`, `gnomad_r3`, `gnomad_r2_1_controls`, `gnomad_r2_1_non_neuro`, `gnomad_r2_1_non_cancer`, or `gnomad_r2_1_non_topmed`
75+
| -sv_dataset | *It defines structural variants dataset.* | `gnomad_sv_r2_1`, `gnomad_sv_r2_1_controls`, or `gnomad_sv_r2_1_non_neuro`
76+
| -reference_genome | *It defines reference genome build.* | `GRCh37` or `GRCh38`
77+
| -h | *It displays the parameters.* | *To get help via script:* `python gnomad_api_cli.py -h`
78+
2979

30-
## :hash: Example Usages
80+
> ❗ Here, for getting variants, `gnomad_r2_1` and `gnomad_sv_r2_1` are defined as default values for these two `-dataset` and `-sv_dataset` options, respectively.
81+
>
82+
>
83+
> ❗ Also, you need to choose `GRCh38` for retrieving variants from the `gnomad_r3` dataset. However, in the `GRCh38` build, structural variants are not available.
84+
85+
## :hash: CLI | Example Usages
3186
- **How to list the variants by gene name or gene id?**
3287

33-
`python gnomad_python_api.py -filter_by="gene_name" -search_by="TP53" -dataset="gnomad_r2_1"`
88+
*For gene name:*
89+
90+
`python gnomad_api_cli.py -filter_by=gene_name -search_by="BRCA1" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"`
3491

35-
> Here, "**gene_id**" can also be used instead of "**gene_name**" after stating an **Ensembl Gene ID** instead of a gene name.
92+
If you get data from `gnomad_r3`:
93+
94+
`python gnomad_api_cli.py -filter_by=gene_name -search_by="BRCA1" -dataset="gnomad_r3" -reference_genome="GRCh38"`
95+
96+
*For Ensembl gene ID*
97+
98+
`python gnomad_api_cli.py -filter_by=gene_id -search_by="ENSG00000169174" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"`
3699

37100
- **How to list the variants by transcript ID?**
38101

39-
`python gnomad_python_api.py -filter_by="transcript_id" -search_by="ENST00000544455" -dataset="gnomad_r3"`
102+
`python gnomad_api_cli.py -filter_by=transcript_id -search_by="ENST00000407236" -dataset="gnomad_r2_1"`
103+
104+
- **How to get variant info by RS ID (rsId)?**
105+
106+
`python gnomad_api_cli.py -filter_by=rs_id -search_by="rs201857604" -dataset="gnomad_r2_1"`
40107

41108
- **How to list the variants using a file containing genes/transcripts?**
42109

43-
- Prepare your file that contains gene name, Ensembl gene IDs or Ensembl transcript IDs line-by-line.
110+
- Prepare your file that contains gene name, Ensembl gene IDs, Ensembl transcript IDs or RS IDs line-by-line.
44111
> ENSG00000169174 <br> ENSG00000171862 <br> ENSG00000170445
45112

46113
- Then, run the following command:
47114

48-
`python gnomad_python_api.py -filter_by="gene_id" -search_by="myFavoriteGenes.txt" -dataset="exac"`
115+
`python gnomad_api_cli.py -filter_by="gene_id" -search_by="myFavoriteGenes.txt" -dataset="gnomad_r2_1" -sv_dataset="gnomad_sv_r2_1"`
49116

50-
> Please, use only one type of identifier in the file.
117+
> Please, use only one type of identifier in the file.
51118
52-
- Then, the variants will be listed in "**outputs**" folder in the files according to their identifier (gene name, gene id or transcript id).
119+
- Then, the variants will be listed in "**outputs**" folder in the folders according to their identifier (gene name, gene id, transcript id or rsId).
120+
53121
- That's all!
54122

123+
## :hash: Disclaimer
124+
All the outputs provided by this tool are for informational purposes only.
125+
126+
The information is not intended to replace any consultation, diagnosis, and/or medical treatment offered by physicians or healthcare providers.
127+
128+
The author of the app will not be liable for any direct, indirect, consequential, special, exemplary, or other damages arising therefrom.
129+
55130
## :hash: Contributing & Feedback
56-
I would be very happy to see any feedbacks and contributions on the script.
131+
I would be very happy to see any feedback or contributions to the project.
132+
133+
For problems and enhancement requests, please `open an issue` above.
134+
135+
## :hash: Citation
136+
Upcoming !
57137

58-
**Furkan Torun | [[email protected]](mailto:[email protected]) | Website: [furkanmtorun.github.io](https://furkanmtorun.github.io/)**
138+
## :hash: Developer
139+
**Furkan M. Torun ([@furkanmtorun](http://github.com/furkanmtorun)) | [[email protected]](mailto:[email protected]) |
140+
Academia: [Google Scholar Profile](https://scholar.google.com/citations?user=d5ZyOZ4AAAAJ)**
59141

142+
## :hash: References
143+
1. Karczewski, K.J., Francioli, L.C., Tiao, G. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). https://doi.org/10.1038/s41586-020-2308-7
60144

61145

0 commit comments

Comments
 (0)