|
| 1 | +# alethiotx |
| 2 | + |
| 3 | +[](https://www.python.org/downloads/) |
| 4 | +[](https://opensource.org/licenses/MIT) |
| 5 | + |
| 6 | +**Alethio Therapeutics Python Toolkit** - A growing collection of open-source computational tools used by Alethio Therapeutics. |
| 7 | + |
| 8 | +## Overview |
| 9 | + |
| 10 | +`alethiotx` is a modular Python package providing specialized tools for therapeutic research and drug discovery. Currently, the package features the **Artemis** module for drug target prioritization using public knowledge graphs. Additional modules and capabilities will be added in future releases. |
| 11 | + |
| 12 | +### Current Modules |
| 13 | + |
| 14 | +#### Artemis Module (`alethiotx.artemis`) |
| 15 | + |
| 16 | +The Artemis module enables accessible and scalable drug prioritization by integrating clinical trial data, drug databases (TTD), pathway information, and machine learning models. It leverages public knowledge graphs to prioritize therapeutic targets across multiple disease areas. |
| 17 | + |
| 18 | +### Artemis Module Features |
| 19 | + |
| 20 | +- **Clinical Trials**: Query and analyze clinical trials data from ClinicalTrials.gov |
| 21 | +- **TTD**: Match clinical interventions with TTD drug information and targets |
| 22 | +- **Pathway Genes**: Retrieve and analyze pathway genes using GeneShot API |
| 23 | +- **Target Scoring**: Calculate clinical target scores for drug targets based on trial phases and approvals |
| 24 | +- **Machine Learning Pipeline**: Built-in cross-validation and for target prediction |
| 25 | +- **Multi-Disease Support**: Pre-configured for breast, lung, prostate, melanoma, bowel cancer, diabetes, and cardiovascular disease |
| 26 | + |
| 27 | +### Future Modules |
| 28 | + |
| 29 | +Additional modules for various aspects of drug discovery and therapeutic research are planned for future releases. Stay tuned! |
| 30 | + |
| 31 | +## Installation |
| 32 | + |
| 33 | +```bash |
| 34 | +pip install alethiotx |
| 35 | +``` |
| 36 | + |
| 37 | +## Quick Start |
| 38 | + |
| 39 | +> **Note:** The examples below demonstrate the **Artemis** module functionality. As new modules are added to the package, they will have their own usage examples. |
| 40 | +
|
| 41 | +### 1. Retrieve Clinical Trials Data |
| 42 | + |
| 43 | +```python |
| 44 | +from alethiotx.artemis import trials, ttd, drugscores |
| 45 | + |
| 46 | +# Query clinical trials for a specific indication |
| 47 | +breast_trials = get_clinical_trials(search='Breast Cancer', last_6_years=True) |
| 48 | + |
| 49 | +# Match trials with TTD to get target information |
| 50 | +ttd_data = ttd(breast_trials) |
| 51 | + |
| 52 | +# Calculate clinical development scores |
| 53 | +scores = get_clinical_scores(ttd_data, include_approved=True) |
| 54 | +print(scores.head()) |
| 55 | +``` |
| 56 | + |
| 57 | +### 2. Load Pre-computed Clinical Scores |
| 58 | + |
| 59 | +```python |
| 60 | +from alethiotx.artemis import load_clinical_scores |
| 61 | + |
| 62 | +# Load clinical scores for multiple diseases |
| 63 | +breast, lung, prostate, melanoma, bowel, diabetes, cardio = load_clinical_scores(date='2025-11-11') |
| 64 | +``` |
| 65 | + |
| 66 | +### 3. Pathway Gene Analysis |
| 67 | + |
| 68 | +```python |
| 69 | +from alethiotx.artemis import get_pathway_genes load_pathway_genes |
| 70 | + |
| 71 | +# Query GeneShot for disease-associated genes |
| 72 | +aml_genes = get_pathway_genes("acute myeloid leukemia") |
| 73 | +print(aml_genes.loc["FLT3", ["gene_count", "rank"]]) |
| 74 | + |
| 75 | +# Get top pathway genes for diseases |
| 76 | +breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg = load_pathway_genes(n=100) |
| 77 | +``` |
| 78 | + |
| 79 | +### 4. Machine Learning Pipeline |
| 80 | + |
| 81 | +```python |
| 82 | +from alethiotx.artemis import pre_model, cv_pipeline, roc_curve |
| 83 | +import pandas as pd |
| 84 | + |
| 85 | +# Prepare your knowledge graph features (X) and clinical scores (y) |
| 86 | +result = pre_model(X, y, pathway_genes=pathway_genes, bins=3) |
| 87 | + |
| 88 | +# Run cross-validation pipeline |
| 89 | +scores = cv_pipeline(X, y, n_iterations=10, scoring='roc_auc') |
| 90 | +print(f"Mean AUC: {sum(scores)/len(scores):.3f}") |
| 91 | + |
| 92 | +# Generate ROC curves |
| 93 | +mean_auc = roc_curve(result['X'], result['y_binary'], n_splits=5, classifier='rf') |
| 94 | +``` |
| 95 | + |
| 96 | +### 5. Visualize Gene Overlaps with UpSet Plots |
| 97 | + |
| 98 | +```python |
| 99 | +from alethiotx.artemis import prepare_upset, create_upset_plot |
| 100 | + |
| 101 | +# Load clinical scores or pathway genes for multiple diseases |
| 102 | +breast, lung, prostate, melanoma, bowel, diabetes, cardio = load_clinical_scores() |
| 103 | + |
| 104 | +# Prepare data for UpSet plot (mode='ct' for clinical targets) |
| 105 | +upset_data = prepare_upset(breast, lung, prostate, melanoma, bowel, diabetes, cardio, mode='ct') |
| 106 | + |
| 107 | +# Create and display the UpSet plot |
| 108 | +plot = create_upset_plot(upset_data, min_subset_size=5) |
| 109 | +plot.plot() |
| 110 | + |
| 111 | +# For pathway genes, use mode='pg' |
| 112 | +breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg = load_pathway_genes(n=100) |
| 113 | +upset_data_pg = prepare_upset(breast_pg, lung_pg, prostate_pg, melanoma_pg, bowel_pg, diabetes_pg, cardio_pg, mode='pg') |
| 114 | +plot_pg = create_upset_plot(upset_data_pg, min_subset_size=10) |
| 115 | +plot_pg.plot() |
| 116 | +``` |
| 117 | + |
| 118 | +## Supported Disease Indications (Artemis Module) |
| 119 | + |
| 120 | +The Artemis module includes built-in support for: |
| 121 | + |
| 122 | +- **Myeloproliferative Neoplasm (MPN)** |
| 123 | +- **Breast Cancer** |
| 124 | +- **Lung Cancer** |
| 125 | +- **Prostate Cancer** |
| 126 | +- **Bowel Cancer (Colorectal)** |
| 127 | +- **Melanoma** |
| 128 | +- **Diabetes Mellitus Type 2** |
| 129 | +- **Cardiovascular Disease** |
| 130 | + |
| 131 | +## Artemis Module API Reference |
| 132 | + |
| 133 | +### Data Loading & Processing |
| 134 | + |
| 135 | +- `get_clinical_trials()` - Retrieve clinical trials from ClinicalTrials.gov |
| 136 | +- `ttd()` - Match trials with TTD drug/target data |
| 137 | +- `get_clinical_scores()` - Calculate per-target clinical development scores |
| 138 | +- `load_clinical_scores()` - Load pre-computed clinical scores from S3 |
| 139 | +- `get_pathway_genes()` - Query Ma'ayan Lab's GeneShot API for gene associations |
| 140 | +- `load_pathway_genes()` - Retrieve pathway gene data |
| 141 | + |
| 142 | +### Data Preparation |
| 143 | + |
| 144 | +- `get_all_targets()` - Extract unique target genes from score lists |
| 145 | +- `cut_clinical_scores()` - Filter scores by threshold |
| 146 | +- `find_overlapping_genes()` - Identify genes present in multiple datasets |
| 147 | +- `uniquify_clinical_scores()` - Remove overlapping genes from clinical scores |
| 148 | +- `uniquify_pathway_genes()` - Remove overlapping genes from pathway lists |
| 149 | + |
| 150 | +### Machine Learning |
| 151 | + |
| 152 | +- `pre_model()` - Prepare datasets for ML model training |
| 153 | +- `cv_pipeline()` - Cross-validation pipeline with customizable classifiers |
| 154 | + |
| 155 | +### Visualization |
| 156 | + |
| 157 | +- `prepare_upset()` - Prepare disease-related data for UpSet plot visualization |
| 158 | +- `create_upset_plot()` - Create UpSet plots for visualizing gene set intersections across diseases |
| 159 | + |
| 160 | +## Data Storage (Artemis Module) |
| 161 | + |
| 162 | +The Artemis module uses AWS S3 for storing pre-computed data: |
| 163 | + |
| 164 | +``` |
| 165 | +s3://alethiotx-artemis/data/ |
| 166 | +├── clinical_targets/{date}/{disease}.csv |
| 167 | +├── pathway_genes/{date}/{disease}.csv |
| 168 | +└── ttd/{date} |
| 169 | +``` |
| 170 | + |
| 171 | +## Requirements |
| 172 | + |
| 173 | +- Python >= 3.9 |
| 174 | +- requests |
| 175 | +- scikit-learn |
| 176 | +- pandas |
| 177 | +- numpy |
| 178 | +- matplotlib |
| 179 | +- setuptools |
| 180 | +- fsspec |
| 181 | +- s3fs |
| 182 | +- upsetplot |
| 183 | + |
| 184 | +## Citation |
| 185 | + |
| 186 | +If you use the Artemis module in your research, please cite: |
| 187 | + |
| 188 | +``` |
| 189 | +Artemis: public knowledge graphs enable accessible and scalable drug target discovery |
| 190 | +Vladimir Kiselev, Alethio Therapeutics |
| 191 | +``` |
| 192 | + |
| 193 | +For other modules, citation information will be provided as they are released. |
| 194 | + |
| 195 | +## License |
| 196 | + |
| 197 | +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
| 198 | + |
| 199 | +## Author |
| 200 | + |
| 201 | +**Vladimir Kiselev** |
| 202 | +Email: vlad.kiselev@alethiomics.com |
| 203 | + |
| 204 | +## Links |
| 205 | + |
| 206 | +- **Homepage**: https://github.com/alethiotx/pypi |
| 207 | +- **Issues**: https://github.com/alethiotx/pypi/issues |
| 208 | + |
| 209 | +## Contributing |
| 210 | + |
| 211 | +Contributions are welcome! Please feel free to submit a Pull Request. |
| 212 | + |
| 213 | +--- |
| 214 | + |
| 215 | +**Current Focus:** Artemis - Enabling accessible and scalable drug target discovery through public knowledge graphs. |
| 216 | +**Coming Soon:** Additional modules for expanded drug discovery capabilities. |
0 commit comments