1313
1414#### Artemis Module (` alethiotx.artemis ` )
1515
16- The Artemis module enables accessible and scalable drug prioritization by integrating clinical trial data, drug databases (TTD ), pathway information, and machine learning models . It leverages public knowledge graphs to prioritize therapeutic targets across multiple disease areas.
16+ The Artemis module enables accessible and scalable drug target prioritization by integrating drug molecule and target data from ChEMBL (including clinical trial phases and approvals ), MeSH disease hierarchies, HGNC gene families, pathway information from GeneShot , and machine learning pipelines . It leverages public knowledge graphs to prioritize therapeutic targets across multiple disease areas.
1717
1818### Artemis Module Features
1919
20- - ** Clinical Trials** : Query and analyze clinical trials data from ClinicalTrials.gov
21- - ** TTD** : Match clinical interventions with TTD drug information and targets
22- - ** Pathway Genes** : Retrieve and analyze pathway genes using GeneShot API
23- - ** Target Scoring** : Calculate clinical target scores for drug targets based on trial phases and approvals
24- - ** Machine Learning Pipeline** : Built-in cross-validation and for target prediction
20+ - ** ChEMBL Integration** : Query and process ChEMBL bioactive molecule database with clinical trial information and automatic parent molecule normalization
21+ - ** MeSH Hierarchy** : Retrieve MeSH disease trees and descendants for comprehensive disease coverage
22+ - ** HGNC Gene Families** : Download and analyze gene family data to identify and filter over-represented families
23+ - ** Clinical Scoring** : Calculate clinical validation scores for drug targets based on trial phases, approvals, and family representation
24+ - ** Pathway Genes** : Retrieve and analyze disease-associated genes using Ma'ayan Lab's GeneShot API
25+ - ** Machine Learning Pipeline** : Built-in cross-validation with configurable classifiers for target prediction
26+ - ** UpSet Plots** : Visualize gene set intersections across multiple diseases
2527- ** Multi-Disease Support** : Pre-configured for breast, lung, prostate, melanoma, bowel cancer, diabetes, and cardiovascular disease
2628
2729### Future Modules
@@ -38,134 +40,173 @@ pip install alethiotx
3840
3941> ** Note:** The examples below demonstrate the ** Artemis** module functionality. As new modules are added to the package, they will have their own usage examples.
4042
41- ### 1. Retrieve Clinical Trials Data
43+ ### 1. Query ChEMBL and Compute Clinical Scores
4244
4345``` python
44- from alethiotx.artemis import trials, ttd, drugscores
45-
46- # Query clinical trials for a specific indication
47- breast_trials = get_clinical_trials(search = ' Breast Cancer' , last_6_years = True )
48-
49- # Match trials with TTD to get target information
50- ttd_data = ttd(breast_trials)
51-
52- # Calculate clinical development scores
53- scores = get_clinical_scores(ttd_data, include_approved = True )
54- print (scores.head())
46+ from alethiotx.artemis.chembl import molecules
47+ from alethiotx.artemis.clinical import compute
48+
49+ # Query ChEMBL for parent molecules with clinical trial data
50+ chembl_data = molecules(version = ' 36' , top_n_activities = 1 )
51+
52+ # Compute clinical validation scores for specific diseases
53+ results = compute(
54+ mesh_headings = [' Breast Neoplasms' , ' Lung Neoplasms' ],
55+ chembl_version = ' 36' ,
56+ trials_only_last_n_years = 6 ,
57+ filter_families = True
58+ )
59+
60+ # Access results for each disease
61+ breast_targets = results[' Breast Neoplasms' ]
62+ print (breast_targets.head())
5563```
5664
5765### 2. Load Pre-computed Clinical Scores
5866
5967``` python
60- from alethiotx.artemis import load_clinical_scores
68+ from alethiotx.artemis.clinical import load
6169
62- # Load clinical scores for multiple diseases
63- breast, lung, prostate, melanoma, bowel, diabetes, cardio = load_clinical_scores (date = ' 2025-11-11 ' )
70+ # Load pre-computed clinical scores for multiple diseases from S3
71+ breast, lung, prostate, melanoma, bowel, diabetes, cardio = load (date = ' 2025-12-08 ' )
6472```
6573
6674### 3. Pathway Gene Analysis
6775
6876``` python
69- from alethiotx.artemis import get_pathway_genes load_pathway_genes
77+ from alethiotx.artemis.pathway import get, load
7078
71- # Query GeneShot for disease-associated genes
72- aml_genes = get_pathway_genes (" acute myeloid leukemia" )
79+ # Query GeneShot API for disease-associated genes
80+ aml_genes = get (" acute myeloid leukemia" , rif = ' generif ' )
7381print (aml_genes.loc[" FLT3" , [" gene_count" , " rank" ]])
7482
75- # Get top pathway genes for diseases
76- breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg = load_pathway_genes( n = 100 )
83+ # Load pre-computed pathway genes for multiple diseases
84+ breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg = load( date = ' 2025-11-11 ' , n = 100 )
7785```
7886
7987### 4. Machine Learning Pipeline
8088
8189``` python
82- from alethiotx.artemis import pre_model, cv_pipeline, roc_curve
90+ from alethiotx.artemis.cv import prepare, run
8391import pandas as pd
8492
8593# Prepare your knowledge graph features (X) and clinical scores (y)
86- result = pre_model(X, y, pathway_genes = pathway_genes, bins = 3 )
94+ result = prepare(
95+ X,
96+ y,
97+ pathway_genes = pathway_genes,
98+ known_targets = known_targets,
99+ bins = 3 ,
100+ rand_seed = 12345
101+ )
87102
88103# Run cross-validation pipeline
89- scores = cv_pipeline(X, y, n_iterations = 10 , scoring = ' roc_auc' )
104+ scores = run(
105+ result[' X' ],
106+ result[' y_binary' ],
107+ n_splits = 5 ,
108+ n_iterations = 10 ,
109+ classifier = ' rf' ,
110+ scoring = ' roc_auc'
111+ )
90112print (f " Mean AUC: { sum (scores)/ len (scores):.3f } " )
91-
92- # Generate ROC curves
93- mean_auc = roc_curve(result[' X' ], result[' y_binary' ], n_splits = 5 , classifier = ' rf' )
94113```
95114
96115### 5. Visualize Gene Overlaps with UpSet Plots
97116
98117``` python
99- from alethiotx.artemis import prepare_upset, create_upset_plot
118+ from alethiotx.artemis.upset import prepare, create
119+ from alethiotx.artemis.clinical import load
120+ from alethiotx.artemis.pathway import load as load_pathway
100121
101- # Load clinical scores or pathway genes for multiple diseases
102- breast, lung, prostate, melanoma, bowel, diabetes, cardio = load_clinical_scores( )
122+ # Load clinical scores for multiple diseases
123+ breast, lung, prostate, melanoma, bowel, diabetes, cardio = load( date = ' 2025-12-08 ' )
103124
104125# Prepare data for UpSet plot (mode='ct' for clinical targets)
105- upset_data = prepare_upset (breast, lung, prostate, melanoma, bowel, diabetes, cardio, mode = ' ct' )
126+ upset_data = prepare (breast, lung, prostate, melanoma, bowel, diabetes, cardio, mode = ' ct' )
106127
107128# Create and display the UpSet plot
108- plot = create_upset_plot (upset_data, min_subset_size = 5 )
129+ plot = create (upset_data, min_subset_size = 5 )
109130plot.plot()
110131
111132# For pathway genes, use mode='pg'
112- breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg = load_pathway_genes( n = 100 )
113- upset_data_pg = prepare_upset (breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg, mode = ' pg' )
114- plot_pg = create_upset_plot (upset_data_pg, min_subset_size = 10 )
133+ breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg = load_pathway( date = ' 2025-11-11 ' , n = 100 )
134+ upset_data_pg = prepare (breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg, mode = ' pg' )
135+ plot_pg = create (upset_data_pg, min_subset_size = 10 )
115136plot_pg.plot()
116137```
117138
118139## Supported Disease Indications (Artemis Module)
119140
120- The Artemis module includes built-in support for:
141+ The Artemis module includes built-in pre-computed data for:
121142
122- - ** Myeloproliferative Neoplasm (MPN)**
123- - ** Breast Cancer**
124- - ** Lung Cancer**
125- - ** Prostate Cancer**
126- - ** Bowel Cancer (Colorectal)**
127- - ** Melanoma**
143+ - ** Breast Cancer** (Breast Neoplasms)
144+ - ** Lung Cancer** (Lung Neoplasms)
145+ - ** Prostate Cancer** (Prostatic Neoplasms)
146+ - ** Melanoma** (Skin Neoplasms)
147+ - ** Bowel Cancer** (Intestinal Neoplasms)
128148- ** Diabetes Mellitus Type 2**
129149- ** Cardiovascular Disease**
130150
151+ The module supports querying any disease with MeSH headings via the ` compute() ` function.
152+
131153## Artemis Module API Reference
132154
133- ### Data Loading & Processing
155+ ### ChEMBL Module (` alethiotx.artemis.chembl ` )
156+
157+ - ` molecules(version, top_n_activities) ` - Query ChEMBL for parent molecules with clinical trial data
158+ - ` infer_nct_year(nct_id) ` - Infer registration year from ClinicalTrials.gov NCT identifier
159+
160+ ### Clinical Scores Module (` alethiotx.artemis.clinical ` )
161+
162+ - ` compute(mesh_headings, chembl_version, trials_only_last_n_years, filter_families) ` - Compute clinical validation scores for drug targets
163+ - ` load(date) ` - Load pre-computed clinical scores from S3
164+ - ` lookup_drug_family_representation(chembl) ` - Create drug-disease-family representation lookup table
165+ - ` filter_overrepresented_families(targets_df, drug_chembl_id, mesh_heading, lookup_table) ` - Filter over-represented gene families
166+ - ` unique(scores, overlap, common_genes) ` - Remove overlapping genes from clinical scores
167+ - ` approved(scores) ` - Filter to include only approved targets
168+ - ` all_targets(scores) ` - Extract all unique target genes from score lists
169+
170+ ### Pathway Genes Module (` alethiotx.artemis.pathway ` )
171+
172+ - ` get(search, rif) ` - Query Ma'ayan Lab's GeneShot API for disease-associated genes
173+ - ` load(date, n) ` - Load pre-computed pathway genes from S3
174+ - ` unique(genes, overlap, common_genes) ` - Remove overlapping genes from pathway lists
175+
176+ ### MeSH Module (` alethiotx.artemis.mesh ` )
177+
178+ - ` tree(s3_base, url_base, file_base) ` - Retrieve MeSH tree structure
179+ - ` descendants(heading, s3_base, file_base, url_base) ` - Get all descendant MeSH headings
180+
181+ ### HGNC Module (` alethiotx.artemis.hgnc ` )
134182
135- - ` get_clinical_trials() ` - Retrieve clinical trials from ClinicalTrials.gov
136- - ` ttd() ` - Match trials with TTD drug/target data
137- - ` get_clinical_scores() ` - Calculate per-target clinical development scores
138- - ` load_clinical_scores() ` - Load pre-computed clinical scores from S3
139- - ` get_pathway_genes() ` - Query Ma'ayan Lab's GeneShot API for gene associations
140- - ` load_pathway_genes() ` - Retrieve pathway gene data
183+ - ` download(gene_has_family_url, family_url, hgnc_complete_url) ` - Download HGNC gene family data
184+ - ` process(gene_has_family, family, hgnc_data) ` - Process HGNC data and create gene-family mappings
141185
142- ### Data Preparation
186+ ### Machine Learning Module ( ` alethiotx.artemis.cv ` )
143187
144- - ` get_all_targets() ` - Extract unique target genes from score lists
145- - ` cut_clinical_scores() ` - Filter scores by threshold
146- - ` find_overlapping_genes() ` - Identify genes present in multiple datasets
147- - ` uniquify_clinical_scores() ` - Remove overlapping genes from clinical scores
148- - ` uniquify_pathway_genes() ` - Remove overlapping genes from pathway lists
188+ - ` prepare(X, y, pathway_genes, known_targets, term_num, bins, rand_seed) ` - Prepare datasets for ML model training
189+ - ` run(X, y, n_splits, n_iterations, classifier, scoring) ` - Cross-validation pipeline with configurable classifiers
149190
150- ### Machine Learning
191+ ### Visualization Module ( ` alethiotx.artemis.upset ` )
151192
152- - ` pre_model( )` - Prepare datasets for ML model training
153- - ` cv_pipeline( )` - Cross-validation pipeline with customizable classifiers
193+ - ` prepare(breast, lung, prostate, melanoma, bowel, diabetes, cardiovascular, mode )` - Prepare data for UpSet plot
194+ - ` create(indications, min_subset_size )` - Create UpSet plots for visualizing gene set intersections
154195
155- ### Visualization
196+ ### Utilities ( ` alethiotx.artemis.utils ` )
156197
157- - ` prepare_upset() ` - Prepare disease-related data for UpSet plot visualization
158- - ` create_upset_plot() ` - Create UpSet plots for visualizing gene set intersections across diseases
198+ - ` find_overlapping_genes(genes, overlap, common_genes) ` - Find genes that overlap across multiple gene lists
159199
160200## Data Storage (Artemis Module)
161201
162202The Artemis module uses AWS S3 for storing pre-computed data:
163203
164204```
165205s3://alethiotx-artemis/data/
166- ├── clinical_targets /{date}/{disease}.csv
206+ ├── clinical_scores /{date}/{disease}.csv
167207├── pathway_genes/{date}/{disease}.csv
168- └── ttd/{date}
208+ ├── chembl/{version}/molecules.csv
209+ └── mesh/d{year}.pkl
169210```
170211
171212## Requirements
@@ -175,11 +216,11 @@ s3://alethiotx-artemis/data/
175216- scikit-learn
176217- pandas
177218- numpy
178- - matplotlib
179219- setuptools
180220- fsspec
181221- s3fs
182222- upsetplot
223+ - chembl-downloader
183224
184225## Citation
185226
0 commit comments