Skip to content

Commit 0464e38

Browse files
committed
Add instructions for gene-disease association data file
1 parent ae7d6c8 commit 0464e38

File tree

1 file changed

+17
-11
lines changed

1 file changed

+17
-11
lines changed

BUILD.org

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -97,21 +97,27 @@ corresponding links:
9797
name will be =disease_mappings.tsv.gz=)
9898
- "UMLS CUI to top disease classes" (the resulting file will be named
9999
=disease_mappings_to_attributes.tar.gz=)
100-
Both files are gzipped, so extract them into the =disgenet/= directory
101-
using your favorite method (e.g., gunzip from the command line, 7zip
102-
from within Windows, etc.).
103-
104-
Now that you have the two data files, you should run the AlzKB script
105-
we wrote to filter for rows in those files corresponding to
106-
Alzheimer's Disease, named =alzkb_parse_disgenet.py=. This script is
107-
in the =scripts/= directory of the AlzKB repository, so either find it
108-
on your local filesystem if you already have a copy of the repository,
109-
or find it on the AlzKB GitHub repository in your web browser.
100+
Next, download =curated_disease_gene_associations.tsv.gz= directly by
101+
copying the following URL into your web browser:
102+
https://www.disgenet.org/static/disgenet_ap1/files/downloads/curated_disease_gene_associations.tsv.gz
103+
104+
All three files are gzipped, so extract them into the =disgenet/=
105+
directory using your favorite method (e.g., gunzip from the command
106+
line, 7zip from within Windows, etc.).
107+
108+
Now that you have the three necessary data files, you should run the
109+
AlzKB script we wrote to filter for rows in those files corresponding
110+
to Alzheimer's Disease, named =alzkb_parse_disgenet.py=. This script
111+
is in the =scripts/= directory of the AlzKB repository, so either find
112+
it on your local filesystem if you already have a copy of the
113+
repository, or find it on the AlzKB GitHub repository in your web
114+
browser.
110115

111116
You can then run the Python script from within the =disgenet/=
112117
directory, which should deposit two filtered data files in the
113118
=disgenet/CUSTOM/= subdirectory. These will be automatically detected
114-
and used when you run the ontology population script.
119+
and used when you run the ontology population script, along with the
120+
unmodified =curated_disease_gene_associations.tsv= file.
115121

116122
** SQL data sources
117123
If you don't already have MySQL installed, install it. We recommend

0 commit comments

Comments
 (0)