Skip to content

Commit 44ec825

Browse files
committed
Add disgenet data file download instructions.
1 parent a477eb6 commit 44ec825

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

BUILD.org

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ other flavor of SQL).
5252
| Hetionet | =hetionet= | Many - see =populate-ontology.py= | [[https://github.com/hetio/hetionet/tree/master/hetnet/tsv][GitHub]] | [[Hetionet]] |
5353
| NCBI Gene | =ncbigene= | Genes | [[https://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz][Homo_sapiens.gene_info.gz]] | [[NCBI Gene]] |
5454
| Drugbank | =drugbank= | Drugs / drug candidates | [[https://go.drugbank.com/releases/latest#open-data][DrugBank website]] | [[Drugbank]] |
55+
| DisGeNET | =disgenet= | Diseases and disease-gene edges | [[https://www.disgenet.org/][DisGeNET]] | [[DisGeNET]] |
5556
| | | | | |
5657

5758
*** Hetionet
@@ -84,6 +85,34 @@ Drug Links", click the "Download" button on the row labeled
8485
file, and make sure it is named =drug_links.csv= (some versions use a
8586
space instead of an underscore in the filename).
8687

88+
*** DisGeNET
89+
Although DisGeNET is available under a Creative Commons license, the
90+
database requires users to create a free account to download the
91+
tab-delimited data files. Therefore, you should create a user account
92+
and log in. Then, navigate to the Downloads page on the DisGeNET
93+
website. Now, download the two necessary files by clicking on the
94+
corresponding links:
95+
- "UMLS CUI to several disease vocabularies" (under the "UMLS CUI to
96+
several disease vocabularies" section heading - the resulting file
97+
name will be =disease_mappings.tsv.gz=)
98+
- "UMLS CUI to top disease classes" (the resulting file will be named
99+
=disease_mappings_to_attributes.tar.gz=)
100+
Both files are gzipped, so extract them into the =disgenet/= directory
101+
using your favorite method (e.g., gunzip from the command line, 7zip
102+
from within Windows, etc.).
103+
104+
Now that you have the two data files, you should run the AlzKB script
105+
we wrote to filter for rows in those files corresponding to
106+
Alzheimer's Disease. This script is in the =scripts/= directory of the
107+
AlzKB repository, so either find it on your local filesystem if you
108+
already have a copy of the repository, or find it on the AlzKB page of
109+
GitHub.
110+
111+
You can then run the Python script from within the =disgenet/=
112+
directory, which should deposit two filtered data files in the
113+
=disgenet/CUSTOM/= subdirectory. These will be automatically detected
114+
and used when you run the ontology population script.
115+
87116
** SQL data sources
88117
If you don't already have MySQL installed, install it. We recommend
89118
using either a package manager (if one is available on your OS), or

0 commit comments

Comments
 (0)