Skip to content

Commit d44e718

Browse files
authored
Merge pull request #411 from monarch-initiative/rename-mtc-filter-to-if-hpo-filter
Rename `HpoMtcFilter` to `IfHpoFilter`
2 parents 7cec169 + a9346dd commit d44e718

File tree

8 files changed

+151
-74
lines changed

8 files changed

+151
-74
lines changed

docs/tutorial.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ For general use, we recommend using a combination
314314
of a *phenotype MT filter* (:class:`~gpsea.analysis.mtc_filter.PhenotypeMtcFilter`) with a *multiple testing correction*.
315315
Phenotype MT filter chooses the HPO terms to test according to several heuristics, which
316316
reduce the multiple testing burden and focus the analysis
317-
on the most interesting terms (see :ref:`HPO MT filter <hpo-mt-filter>` for more info).
317+
on the most interesting terms (see :ref:`Independent filtering for HPO <hpo-if-filter>` for more info).
318318
Then the multiple testing correction, such as Bonferroni or Benjamini-Hochberg,
319319
is used to control the family-wise error rate or the false discovery rate.
320320
See :ref:`mtc` for more information.
@@ -323,7 +323,7 @@ See :ref:`mtc` for more information.
323323
>>> analysis = configure_hpo_term_analysis(hpo)
324324

325325
:func:`~gpsea.analysis.pcats.configure_hpo_term_analysis` configures the analysis
326-
that uses HPO MTC filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`) for selecting HPO terms of interest,
326+
that uses Independent filtering for HPO (:class:`~gpsea.analysis.mtc_filter.IfHpoFilter`) for selecting HPO terms of interest,
327327
Fisher Exact test for computing nominal p values, and Benjamini-Hochberg for multiple testing correction.
328328

329329

docs/user-guide/analyses/mtc.rst

Lines changed: 21 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -171,31 +171,37 @@ we pass an iterable (e.g. a tuple) with these two terms as an argument:
171171
2
172172

173173

174-
.. _hpo-mt-filter:
174+
.. _hpo-if-filter:
175175

176-
HPO MT filter
177-
-------------
176+
Independent filtering for HPO
177+
-----------------------------
178+
179+
Independent filtering for HPO involves making several domain judgments
180+
and taking advantage of the HPO structure
181+
in order to reduce the number of HPO terms for testing.
182+
The filter's logic is made up of 8 individual heuristics
183+
to skip testing the terms that are unlikely to yield significant or interesting results (see below).
178184

179-
The HPO MT filter involves making several domain judgments and takes advantage of the HPO structure.
180-
The strategy needs access to HPO:
185+
Some of the heuristics need to access HPO hierarchy,
186+
so let's load HPO
181187

182188
>>> import hpotk
183189
>>> store = hpotk.configure_ontology_store()
184190
>>> hpo = store.load_minimal_hpo(release='v2024-07-01')
185191

186-
and it is implemented in the :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter` class:
192+
and let's create the :class:`~gpsea.analysis.mtc_filter.IfHpoFilter` class
193+
using the static constructor
194+
:func:`~gpsea.analysis.mtc_filter.IfHpoFilter.default_filter`:
195+
196+
>>> from gpsea.analysis.mtc_filter import IfHpoFilter
197+
>>> hpo_mtc = IfHpoFilter.default_filter(hpo=hpo)
187198

188-
>>> from gpsea.analysis.mtc_filter import HpoMtcFilter
189-
>>> hpo_mtc = HpoMtcFilter.default_filter(hpo=hpo)
190199

200+
The constructor takes HPO and two thresholds (optional).
201+
See the API documentation and the explanations below for more details.
191202

192-
We use static constructor :func:`~gpsea.analysis.mtc_filter.HpoMtcFilter.default_filter`
193-
for creating :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`.
194-
The constructor takes a ``term_frequency_threshold`` option (40% by default)
195-
and the method's logic is made up of 8 individual heuristics
196-
designed to skip testing the HPO terms that are unlikely to yield significant or interesting results.
197203

198-
.. contents:: HPO MT filters
204+
.. contents:: Independent filtering for HPO
199205
:depth: 1
200206
:local:
201207

@@ -296,6 +302,6 @@ and we have explicit observed observations for 20 and excluded for 10 individual
296302
then the annotation frequency is `0.3`.
297303

298304
The threshold is set as ``annotation_frequency_threshold`` option
299-
of the :func:`~gpsea.analysis.mtc_filter.HpoMtcFilter.default_filter` constructor,
305+
of the :func:`~gpsea.analysis.mtc_filter.IfHpoFilter.default_filter` constructor,
300306
with the default value of `0.4` (40%).
301307

docs/user-guide/analyses/phenotype-classes.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ a phenotype multiple testing (MT) filter and multiple testing correction (MTC).
207207

208208
Phenotype MT filter selects a (sub)set of HPO terms for testing,
209209
for instance only the user-selected terms (see :class:`~gpsea.analysis.mtc_filter.SpecifiedTermsMtcFilter`)
210-
or the terms selected by :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`.
210+
or the terms selected by :class:`~gpsea.analysis.mtc_filter.IfHpoFilter`.
211211

212212
MTC then adjusts the nominal p values for the increased risk
213213
of false positive G/P associations.
@@ -221,8 +221,8 @@ We must choose a phenotype MT filter as well as a MTC procedure to perform genot
221221
Default analysis
222222
^^^^^^^^^^^^^^^^
223223

224-
We recommend using HPO MT filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`) as a phenotype MT filter
225-
and Benjamini-Hochberg for MTC.
224+
We recommend using Independent filtering for HPO (:class:`~gpsea.analysis.mtc_filter.IfHpoFilter`)
225+
and Benjamini-Hochberg MT correction.
226226
The default analysis can be configured with :func:`~gpsea.analysis.pcats.configure_hpo_term_analysis` convenience method.
227227

228228
>>> from gpsea.analysis.pcats import configure_hpo_term_analysis
@@ -240,10 +240,10 @@ Custom analysis
240240
If the default selection of phenotype MT filter and multiple testing correction is not an option,
241241
we can configure the analysis manually.
242242

243-
First, we choose a phenotype MT filter (e.g. :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`):
243+
First, we choose a phenotype MT filter (e.g. :class:`~gpsea.analysis.mtc_filter.IfHpoFilter`):
244244

245-
>>> from gpsea.analysis.mtc_filter import HpoMtcFilter
246-
>>> mtc_filter = HpoMtcFilter.default_filter(hpo, term_frequency_threshold=.2)
245+
>>> from gpsea.analysis.mtc_filter import IfHpoFilter
246+
>>> mtc_filter = IfHpoFilter.default_filter(hpo, term_frequency_threshold=.2)
247247

248248
.. note::
249249

src/gpsea/analysis/mtc_filter/__init__.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,15 @@
66
"""
77

88
from ._impl import PhenotypeMtcFilter, PhenotypeMtcResult, PhenotypeMtcIssue
9-
from ._impl import UseAllTermsMtcFilter, SpecifiedTermsMtcFilter, HpoMtcFilter
9+
from ._impl import UseAllTermsMtcFilter, SpecifiedTermsMtcFilter, IfHpoFilter
10+
from ._impl import HpoMtcFilter
1011

1112
__all__ = [
12-
'PhenotypeMtcFilter', 'PhenotypeMtcResult', 'PhenotypeMtcIssue',
13-
'UseAllTermsMtcFilter', 'SpecifiedTermsMtcFilter', 'HpoMtcFilter',
13+
"PhenotypeMtcFilter",
14+
"PhenotypeMtcResult",
15+
"PhenotypeMtcIssue",
16+
"UseAllTermsMtcFilter",
17+
"SpecifiedTermsMtcFilter",
18+
"IfHpoFilter",
19+
"HpoMtcFilter",
1420
]

src/gpsea/analysis/mtc_filter/_impl.py

Lines changed: 90 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import typing
44

55
from collections import deque
6+
import warnings
67

78
import hpotk
89
import pandas as pd
@@ -252,14 +253,14 @@ def verify_term_id(val: typing.Union[str, hpotk.TermId]) -> hpotk.TermId:
252253
raise ValueError(f"{val} is neither `str` nor `hpotk.TermId`")
253254

254255

255-
class HpoMtcFilter(PhenotypeMtcFilter[hpotk.TermId]):
256+
class IfHpoFilter(PhenotypeMtcFilter[hpotk.TermId]):
256257
"""
257-
`HpoMtcFilter` decides which phenotypes should be tested and which phenotypes are not worth testing.
258+
`IfHpoFilter` decides which phenotypes should be tested and which phenotypes are not worth testing.
258259
259260
The class leverages a number of heuristics and domain decisions.
260-
See :ref:`hpo-mt-filter` section for more info.
261+
See :ref:`hpo-if-filter` section for more info.
261262
262-
We recommend creating an instance using the :func:`default_filter` static factory method.
263+
We recommend creating an instance using the :func:`~gpsea.analysis.mtc_filter.IfHpoFilter.default_filter` static factory method.
263264
"""
264265

265266
NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO = PhenotypeMtcResult.fail(
@@ -340,7 +341,7 @@ def default_filter(
340341
general_hpo_term_set.update(second_level_terms)
341342
general_hpo_term_set.update(third_level_terms)
342343

343-
return HpoMtcFilter(
344+
return IfHpoFilter(
344345
hpo=hpo,
345346
term_frequency_threshold=term_frequency_threshold,
346347
annotation_frequency_threshold=annotation_frequency_threshold,
@@ -355,13 +356,15 @@ def __init__(
355356
general_hpo_terms: typing.Iterable[hpotk.TermId],
356357
):
357358
self._hpo = hpo
358-
assert isinstance(term_frequency_threshold, (int, float)) \
359-
and 0. < term_frequency_threshold <= 1., \
360-
"The term_frequency_threshold must be in the range (0, 1]"
359+
assert (
360+
isinstance(term_frequency_threshold, (int, float))
361+
and 0.0 < term_frequency_threshold <= 1.0
362+
), "The term_frequency_threshold must be in the range (0, 1]"
361363
self._hpo_term_frequency_filter = term_frequency_threshold
362-
assert isinstance(annotation_frequency_threshold, (int, float)) \
363-
and 0. < annotation_frequency_threshold <= 1., \
364-
"The annotation_frequency_threshold must be in the range (0, 1]"
364+
assert (
365+
isinstance(annotation_frequency_threshold, (int, float))
366+
and 0.0 < annotation_frequency_threshold <= 1.0
367+
), "The annotation_frequency_threshold must be in the range (0, 1]"
365368
self._hpo_annotation_frequency_threshold = annotation_frequency_threshold
366369

367370
self._general_hpo_terms = set(general_hpo_terms)
@@ -429,17 +432,17 @@ def filter(
429432
continue
430433

431434
if term_id in self._general_hpo_terms:
432-
results[idx] = HpoMtcFilter.SKIPPING_GENERAL_TERM
435+
results[idx] = IfHpoFilter.SKIPPING_GENERAL_TERM
433436
continue
434437

435438
if not self._hpo.graph.is_ancestor_of(PHENOTYPIC_ABNORMALITY, term_id):
436-
results[idx] = HpoMtcFilter.SKIPPING_NON_PHENOTYPE_TERM
439+
results[idx] = IfHpoFilter.SKIPPING_NON_PHENOTYPE_TERM
437440
continue
438441

439442
ph_clf = pheno_clfs[idx]
440443
contingency_matrix = counts[idx]
441444

442-
max_freq = HpoMtcFilter.get_maximum_group_observed_HPO_frequency(
445+
max_freq = IfHpoFilter.get_maximum_group_observed_HPO_frequency(
443446
contingency_matrix,
444447
ph_clf=ph_clf,
445448
)
@@ -465,19 +468,19 @@ def filter(
465468
results[idx] = self._not_powered_for_2_by_3
466469
continue
467470

468-
if not HpoMtcFilter.some_cell_has_greater_than_one_count(
471+
if not IfHpoFilter.some_cell_has_greater_than_one_count(
469472
counts=contingency_matrix,
470473
ph_clf=ph_clf,
471474
):
472-
results[idx] = HpoMtcFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO
475+
results[idx] = IfHpoFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO
473476
continue
474477

475-
elif HpoMtcFilter.one_genotype_has_zero_hpo_observations(
478+
elif IfHpoFilter.one_genotype_has_zero_hpo_observations(
476479
counts=contingency_matrix,
477480
gt_clf=gt_clf,
478481
):
479482
results[idx] = (
480-
HpoMtcFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS
483+
IfHpoFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS
481484
)
482485
continue
483486

@@ -501,7 +504,7 @@ def filter(
501504
axis=None
502505
) < 1:
503506
# Do not test if the count is exactly the same to the counts in the only child term.
504-
results[idx] = HpoMtcFilter.SAME_COUNT_AS_THE_ONLY_CHILD
507+
results[idx] = IfHpoFilter.SAME_COUNT_AS_THE_ONLY_CHILD
505508
continue
506509

507510
# ##
@@ -526,18 +529,18 @@ def possible_results(self) -> typing.Collection[PhenotypeMtcResult]:
526529
return (
527530
PhenotypeMtcFilter.OK,
528531
self._below_frequency_threshold, # HMF01
529-
HpoMtcFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO, # HMF02
530-
HpoMtcFilter.SAME_COUNT_AS_THE_ONLY_CHILD, # HMF03
531-
HpoMtcFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS, # HMF05
532+
IfHpoFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO, # HMF02
533+
IfHpoFilter.SAME_COUNT_AS_THE_ONLY_CHILD, # HMF03
534+
IfHpoFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS, # HMF05
532535
self._not_powered_for_2_by_2, # HMF06
533536
self._not_powered_for_2_by_3, # HMF06
534-
HpoMtcFilter.SKIPPING_NON_PHENOTYPE_TERM, # HMF07
535-
HpoMtcFilter.SKIPPING_GENERAL_TERM, # HMF08
537+
IfHpoFilter.SKIPPING_NON_PHENOTYPE_TERM, # HMF07
538+
IfHpoFilter.SKIPPING_GENERAL_TERM, # HMF08
536539
self._below_annotation_frequency_threshold, # HMF09
537540
)
538541

539542
def filter_method_name(self) -> str:
540-
return "HPO MTC filter"
543+
return "Independent filtering HPO filter"
541544

542545
@staticmethod
543546
def get_number_of_observed_hpo_observations(
@@ -629,3 +632,65 @@ def _get_ordered_terms(
629632

630633
# now, ordered_term_list is ordered from leaves to root
631634
return ordered_term_list
635+
636+
637+
class HpoMtcFilter(IfHpoFilter):
638+
"""
639+
`HpoMtcFilter` is deprecated and will be removed in `1.0.0`.
640+
641+
Use :class:`gpsea.analysis.mtc_filter.IfHpoFilter` instead.
642+
"""
643+
644+
@staticmethod
645+
def default_filter(
646+
hpo: hpotk.MinimalOntology,
647+
term_frequency_threshold: float = 0.4,
648+
annotation_frequency_threshold: float = 0.4,
649+
phenotypic_abnormality: hpotk.TermId = PHENOTYPIC_ABNORMALITY,
650+
):
651+
"""
652+
Args:
653+
hpo: HPO
654+
term_frequency_threshold: a `float` in range :math:`(0, 1]` with the minimum frequency
655+
for an HPO term to have in at least one of the genotype groups
656+
(e.g., 22% in missense and 3% in nonsense genotypes would be OK,
657+
but not 13% missense and 10% nonsense genotypes if the threshold is 0.2).
658+
The default threshold is `0.4` (40%).
659+
annotation_frequency_threshold: a `float` in range :math:`(0, 1]` with the minimum frequency of
660+
annotation in the cohort. For instance, if the cohort consists of 100 individuals, and
661+
we have explicit observed observations for 20 and excluded for 10 individuals, then the
662+
annotation frequency is `0.3`. The purpose of this threshold is to omit terms for which
663+
we simply do not have much data overall. By default, we set a threshold to `0.4` (40%).
664+
phenotypic_abnormality: a :class:`~hpotk.TermId` corresponding to the root of HPO phenotype hierarchy.
665+
Having to specify this option should be very rarely, if ever.
666+
"""
667+
warnings.warn(
668+
"HpoMtcFilter has been deprecated and will be removed in 1.0.0. Use `IfHpoFilter` instead.",
669+
DeprecationWarning,
670+
stacklevel=2,
671+
)
672+
IfHpoFilter.default_filter(
673+
hpo=hpo,
674+
term_frequency_threshold=term_frequency_threshold,
675+
annotation_frequency_threshold=annotation_frequency_threshold,
676+
phenotypic_abnormality=phenotypic_abnormality,
677+
)
678+
679+
def __init__(
680+
self,
681+
hpo: hpotk.MinimalOntology,
682+
term_frequency_threshold: float,
683+
annotation_frequency_threshold: float,
684+
general_hpo_terms: typing.Iterable[hpotk.TermId],
685+
):
686+
super().__init__(
687+
hpo,
688+
term_frequency_threshold,
689+
annotation_frequency_threshold,
690+
general_hpo_terms,
691+
)
692+
warnings.warn(
693+
"HpoMtcFilter has been deprecated and will be removed in 1.0.0. Use `IfHpoFilter` instead.",
694+
DeprecationWarning,
695+
stacklevel=2,
696+
)

src/gpsea/analysis/pcats/_config.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import hpotk
44

5-
from ..mtc_filter import HpoMtcFilter
5+
from ..mtc_filter import IfHpoFilter
66
from ._impl import HpoTermAnalysis
77
from .stats import CountStatistic, FisherExactTest
88

@@ -16,13 +16,13 @@ def configure_hpo_term_analysis(
1616
"""
1717
Configure HPO term analysis with default parameters.
1818
19-
The default analysis will pre-filter HPO terms with :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`,
19+
The default analysis will pre-filter HPO terms with :class:`~gpsea.analysis.mtc_filter.IfHpoFilter`,
2020
then compute nominal p values using `count_statistic` (default Fisher exact test),
2121
and apply multiple testing correction (default Benjamini/Hochberg (`fdr_bh`))
2222
with target `mtc_alpha` (default `0.05`).
2323
"""
2424
return HpoTermAnalysis(
25-
mtc_filter=HpoMtcFilter.default_filter(hpo),
25+
mtc_filter=IfHpoFilter.default_filter(hpo),
2626
count_statistic=count_statistic,
2727
mtc_correction=mtc_correction,
2828
mtc_alpha=mtc_alpha,

tests/analysis/pcats/test_hpo_term_analysis.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
from gpsea.model import Cohort
88

9-
from gpsea.analysis.mtc_filter import PhenotypeMtcFilter, HpoMtcFilter
9+
from gpsea.analysis.mtc_filter import PhenotypeMtcFilter, IfHpoFilter
1010
from gpsea.analysis.pcats import HpoTermAnalysis
1111
from gpsea.analysis.pcats.stats import CountStatistic, FisherExactTest
1212
from gpsea.analysis.clf import GenotypeClassifier, PhenotypeClassifier
@@ -22,7 +22,7 @@ def phenotype_mtc_filter(
2222
self,
2323
hpo: hpotk.MinimalOntology,
2424
) -> PhenotypeMtcFilter:
25-
return HpoMtcFilter.default_filter(
25+
return IfHpoFilter.default_filter(
2626
hpo=hpo,
2727
term_frequency_threshold=0.2,
2828
annotation_frequency_threshold=0.25,

0 commit comments

Comments
 (0)