Skip to content

Commit ae7d6c8

Browse files
committed
Add script to parse disgenet data
1 parent 44ec825 commit ae7d6c8

File tree

2 files changed

+29
-4
lines changed

2 files changed

+29
-4
lines changed

BUILD.org

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -103,10 +103,10 @@ from within Windows, etc.).
103103

104104
Now that you have the two data files, you should run the AlzKB script
105105
we wrote to filter for rows in those files corresponding to
106-
Alzheimer's Disease. This script is in the =scripts/= directory of the
107-
AlzKB repository, so either find it on your local filesystem if you
108-
already have a copy of the repository, or find it on the AlzKB page of
109-
GitHub.
106+
Alzheimer's Disease, named =alzkb_parse_disgenet.py=. This script is
107+
in the =scripts/= directory of the AlzKB repository, so either find it
108+
on your local filesystem if you already have a copy of the repository,
109+
or find it on the AlzKB GitHub repository in your web browser.
110110

111111
You can then run the Python script from within the =disgenet/=
112112
directory, which should deposit two filtered data files in the

scripts/alzkb_parse_disgenet.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# !/usr/bin/env python
2+
## created by Yun Hao and Joe Romano @MooreLab 2022
3+
## This script parses DisGeNET gene-disease relationship data to extract relationships specific to Alzheimer's disease
4+
5+
# NOTE: This file must be run from the `disgenet/` directory containing the original TSV files referenced below!
6+
# Both output files will be deposited into the `disgenet/CUSTOM/` directory.
7+
8+
import pandas as pd
9+
10+
from pathlib import Path
11+
12+
disgenet_df = pd.read_csv("./disease_mappings_to_attributes.tsv", sep="\t", header=0)
13+
disgenet_do_df = pd.read_csv("./disease_mappings.tsv", sep="\t", header=0)
14+
15+
disgenet_ad_df = disgenet_df.loc[disgenet_df["name"].str.contains("Alzheimer"),:]
16+
cuis = list(disgenet_ad_df.diseaseId.unique())
17+
18+
# For adding disease ontology identifiers
19+
disgenet_ad_do_df = disgenet_do_df.loc[disgenet_do_df.diseaseId.isin(cuis),:]
20+
21+
# if we don't have the CUSTOM subdirectory, create it
22+
Path("CUSTOM").mkdir(exist_ok=True)
23+
24+
disgenet_ad_df.to_csv("./CUSTOM/disease_mappings_to_attributes_alzheimer.tsv", sep="\t", header=True, index=False)
25+
disgenet_ad_do_df.to_csv("./CUSTOM/disease_mappings_alzheimer.tsv", sep="\t", header=True, index=False)

0 commit comments

Comments
 (0)