@@ -39,7 +39,9 @@ def lookup_drug_family_representation(chembl: DataFrame) -> DataFrame:
3939 :param chembl: ChEMBL molecule data containing drug-target-indication relationships.
4040 Must include columns: ``chembl_id``, ``target_gene_name``, ``mesh_heading``
4141 :type chembl: DataFrame
42- :return: Lookup table with one row per drug-disease-family combination. **Columns:**
42+ :return: Lookup table with one row per drug-disease-family combination.
43+
44+ **Columns:**
4345
4446 - ``drug_chembl_id`` (str): ChEMBL identifier for the drug (e.g., 'CHEMBL123')
4547 - ``mesh_heading`` (str): MeSH disease term (e.g., 'Lung Neoplasms')
@@ -483,37 +485,41 @@ def compute(mesh_headings: list, chembl_version: str = '36', trials_only_last_n_
483485 biasing results toward that family. With filtering (default), only 1 RTK representative is kept,
484486 producing unbiased scores that better reflect biological diversity.
485487
486- :param mesh_headings: List of MeSH disease headings to analyze. Standard MeSH terms like::
487- - 'Breast Neoplasms' (breast cancer)
488- - 'Lung Neoplasms' (lung cancer)
489- - 'Diabetes Mellitus, Type 2' (type 2 diabetes)
490- - 'Cardiovascular Diseases' (heart disease)
491-
492- Each heading is automatically expanded to include all MeSH descendant terms.
493- Use exact MeSH terminology for best results.
488+ :param mesh_headings: List of MeSH disease headings to analyze. Standard MeSH terms like
489+ - 'Breast Neoplasms' (breast cancer)
490+ - 'Lung Neoplasms' (lung cancer)
491+ - 'Diabetes Mellitus, Type 2' (type 2 diabetes)
492+ - 'Cardiovascular Diseases' (heart disease)
493+
494+ Each heading is automatically expanded to include all MeSH descendant terms.
495+ Use exact MeSH terminology for best results.
494496 :type mesh_headings: list[str]
495- :param chembl_version: ChEMBL database version to use. Versions correspond to data release dates::
496- - '36' (default): Data through 2024
497- - '35': Data through 2023
498-
499- See https://www.ebi.ac.uk/chembl/ for version details.
497+
498+ :param chembl_version: ChEMBL database version to use. Versions correspond to data release dates
499+ - '36' (default): Data through 2024
500+ - '35': Data through 2023
501+
502+ See https://www.ebi.ac.uk/chembl/ for version details.
503+
500504 :type chembl_version: str, optional
501- :param trials_only_last_n_years: Temporal filter for recent trials only. If provided::
502- - Filters to trials registered in last N years
503- - Year inferred from ClinicalTrials.gov NCT IDs
504- - Useful for identifying emerging targets
505-
506- Examples:
507- - ``6``: Last 6 years (captures recent development)
508- - ``10``: Last decade
509- - ``None`` (default): All historical trials
505+
506+ :param trials_only_last_n_years: Temporal filter for recent trials only. If provided
507+ - Filters to trials registered in last N years
508+ - Year inferred from ClinicalTrials.gov NCT IDs
509+ - Useful for identifying emerging targets
510+
511+ Examples:
512+ - ``6``: Last 6 years (captures recent development)
513+ - ``10``: Last decade
514+ - ``None`` (default): All historical trials
510515 :type trials_only_last_n_years: int or None, optional
511- :param filter_families: Whether to apply gene family filtering to reduce bias::
516+
517+ :param filter_families: Whether to apply gene family filtering to reduce bias
512518
513- - ``True`` (default): Filter over-represented families
514- (recommended for unbiased target prioritization)
515- - ``False``: Include all targets without filtering
516- (use when family representation is biologically relevant)
519+ - ``True`` (default): Filter over-represented families
520+ (recommended for unbiased target prioritization)
521+ - ``False``: Include all targets without filtering
522+ (use when family representation is biologically relevant)
517523 :type filter_families: bool, optional
518524 :return: Dictionary mapping each MeSH heading to its results DataFrame.
519525
@@ -893,15 +899,18 @@ def load(date = '2025-12-08'):
893899 :type date: str, optional
894900 :return: Tuple of 7 DataFrames containing clinical scores for each disease. Each DataFrame has:
895901
896- **Columns:**
897-
898- - ``Target Gene`` (str): HGNC gene symbol
899- - ``phase_0`` through ``phase_4`` (int): Count of trials in each phase
900- - ``Phase Score`` (int): Sum of trial phases (0-3)
901- - ``approved`` (bool): Whether target has any approved (phase 4) drug
902- - ``Clinical Score`` (int): Total validation score (Phase Score + 20 if approved)
903-
904- **Order:** (breast, lung, prostate, melanoma, bowel, diabetes, cardiovascular)
902+ **Columns:**
903+
904+ - ``Target Gene`` (str): HGNC gene symbol
905+ - ``phase_0`` through ``phase_4`` (int): Count of trials in each phase
906+ - ``Phase Score`` (int): Sum of trial phases (0-3)
907+ - ``approved`` (bool): Whether target has any approved (phase 4) drug
908+ - ``Clinical Score`` (int): Total validation score (Phase Score + 20 if approved)
909+
910+ **Order:**
911+
912+ (breast, lung, prostate, melanoma, bowel, diabetes, cardiovascular)
913+
905914 :rtype: tuple[DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, DataFrame, DataFrame]
906915 :raises FileNotFoundError: If CSV files don't exist at the specified S3 paths for the given date
907916 :raises ValueError: If CSV files are malformed or missing required columns
0 commit comments