Skip to content

Commit 86f8e60

Browse files
author
wikiselev
committed
Enhance docstrings for lookup_drug_family_representation and compute functions with improved formatting and clarity
1 parent 9d6f909 commit 86f8e60

File tree

1 file changed

+46
-37
lines changed

1 file changed

+46
-37
lines changed

src/alethiotx/artemis/clinical/scores.py

Lines changed: 46 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,9 @@ def lookup_drug_family_representation(chembl: DataFrame) -> DataFrame:
3939
:param chembl: ChEMBL molecule data containing drug-target-indication relationships.
4040
Must include columns: ``chembl_id``, ``target_gene_name``, ``mesh_heading``
4141
:type chembl: DataFrame
42-
:return: Lookup table with one row per drug-disease-family combination. **Columns:**
42+
:return: Lookup table with one row per drug-disease-family combination.
43+
44+
**Columns:**
4345
4446
- ``drug_chembl_id`` (str): ChEMBL identifier for the drug (e.g., 'CHEMBL123')
4547
- ``mesh_heading`` (str): MeSH disease term (e.g., 'Lung Neoplasms')
@@ -483,37 +485,41 @@ def compute(mesh_headings: list, chembl_version: str = '36', trials_only_last_n_
483485
biasing results toward that family. With filtering (default), only 1 RTK representative is kept,
484486
producing unbiased scores that better reflect biological diversity.
485487
486-
:param mesh_headings: List of MeSH disease headings to analyze. Standard MeSH terms like::
487-
- 'Breast Neoplasms' (breast cancer)
488-
- 'Lung Neoplasms' (lung cancer)
489-
- 'Diabetes Mellitus, Type 2' (type 2 diabetes)
490-
- 'Cardiovascular Diseases' (heart disease)
491-
492-
Each heading is automatically expanded to include all MeSH descendant terms.
493-
Use exact MeSH terminology for best results.
488+
:param mesh_headings: List of MeSH disease headings to analyze. Standard MeSH terms like
489+
- 'Breast Neoplasms' (breast cancer)
490+
- 'Lung Neoplasms' (lung cancer)
491+
- 'Diabetes Mellitus, Type 2' (type 2 diabetes)
492+
- 'Cardiovascular Diseases' (heart disease)
493+
494+
Each heading is automatically expanded to include all MeSH descendant terms.
495+
Use exact MeSH terminology for best results.
494496
:type mesh_headings: list[str]
495-
:param chembl_version: ChEMBL database version to use. Versions correspond to data release dates::
496-
- '36' (default): Data through 2024
497-
- '35': Data through 2023
498-
499-
See https://www.ebi.ac.uk/chembl/ for version details.
497+
498+
:param chembl_version: ChEMBL database version to use. Versions correspond to data release dates
499+
- '36' (default): Data through 2024
500+
- '35': Data through 2023
501+
502+
See https://www.ebi.ac.uk/chembl/ for version details.
503+
500504
:type chembl_version: str, optional
501-
:param trials_only_last_n_years: Temporal filter for recent trials only. If provided::
502-
- Filters to trials registered in last N years
503-
- Year inferred from ClinicalTrials.gov NCT IDs
504-
- Useful for identifying emerging targets
505-
506-
Examples:
507-
- ``6``: Last 6 years (captures recent development)
508-
- ``10``: Last decade
509-
- ``None`` (default): All historical trials
505+
506+
:param trials_only_last_n_years: Temporal filter for recent trials only. If provided
507+
- Filters to trials registered in last N years
508+
- Year inferred from ClinicalTrials.gov NCT IDs
509+
- Useful for identifying emerging targets
510+
511+
Examples:
512+
- ``6``: Last 6 years (captures recent development)
513+
- ``10``: Last decade
514+
- ``None`` (default): All historical trials
510515
:type trials_only_last_n_years: int or None, optional
511-
:param filter_families: Whether to apply gene family filtering to reduce bias::
516+
517+
:param filter_families: Whether to apply gene family filtering to reduce bias
512518
513-
- ``True`` (default): Filter over-represented families
514-
(recommended for unbiased target prioritization)
515-
- ``False``: Include all targets without filtering
516-
(use when family representation is biologically relevant)
519+
- ``True`` (default): Filter over-represented families
520+
(recommended for unbiased target prioritization)
521+
- ``False``: Include all targets without filtering
522+
(use when family representation is biologically relevant)
517523
:type filter_families: bool, optional
518524
:return: Dictionary mapping each MeSH heading to its results DataFrame.
519525
@@ -893,15 +899,18 @@ def load(date = '2025-12-08'):
893899
:type date: str, optional
894900
:return: Tuple of 7 DataFrames containing clinical scores for each disease. Each DataFrame has:
895901
896-
**Columns:**
897-
898-
- ``Target Gene`` (str): HGNC gene symbol
899-
- ``phase_0`` through ``phase_4`` (int): Count of trials in each phase
900-
- ``Phase Score`` (int): Sum of trial phases (0-3)
901-
- ``approved`` (bool): Whether target has any approved (phase 4) drug
902-
- ``Clinical Score`` (int): Total validation score (Phase Score + 20 if approved)
903-
904-
**Order:** (breast, lung, prostate, melanoma, bowel, diabetes, cardiovascular)
902+
**Columns:**
903+
904+
- ``Target Gene`` (str): HGNC gene symbol
905+
- ``phase_0`` through ``phase_4`` (int): Count of trials in each phase
906+
- ``Phase Score`` (int): Sum of trial phases (0-3)
907+
- ``approved`` (bool): Whether target has any approved (phase 4) drug
908+
- ``Clinical Score`` (int): Total validation score (Phase Score + 20 if approved)
909+
910+
**Order:**
911+
912+
(breast, lung, prostate, melanoma, bowel, diabetes, cardiovascular)
913+
905914
:rtype: tuple[DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, DataFrame]
906915
:raises FileNotFoundError: If CSV files don't exist at the specified S3 paths for the given date
907916
:raises ValueError: If CSV files are malformed or missing required columns

0 commit comments

Comments
 (0)