Skip to content

Commit 1b1bb5e

Browse files
authored
Merge pull request #413 from monarch-initiative/develop
Release `v0.9.4`
2 parents 1c0c4e2 + 869e114 commit 1b1bb5e

File tree

10 files changed

+182
-71
lines changed

10 files changed

+182
-71
lines changed

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@
6464
# The short X.Y version.
6565
version = u'0.9'
6666
# The full version, including alpha/beta/rc tags.
67-
release = u'0.9.3'
67+
release = u'0.9.4'
6868

6969
# The language for content autogenerated by Sphinx. Refer to documentation
7070
# for a list of supported languages.

docs/tutorial.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -314,7 +314,7 @@ For general use, we recommend using a combination
314314
of a *phenotype MT filter* (:class:`~gpsea.analysis.mtc_filter.PhenotypeMtcFilter`) with a *multiple testing correction*.
315315
Phenotype MT filter chooses the HPO terms to test according to several heuristics, which
316316
reduce the multiple testing burden and focus the analysis
317-
on the most interesting terms (see :ref:`HPO MT filter <hpo-mt-filter>` for more info).
317+
on the most interesting terms (see :ref:`Independent filtering for HPO <hpo-if-filter>` for more info).
318318
Then the multiple testing correction, such as Bonferroni or Benjamini-Hochberg,
319319
is used to control the family-wise error rate or the false discovery rate.
320320
See :ref:`mtc` for more information.
@@ -323,7 +323,7 @@ See :ref:`mtc` for more information.
323323
>>> analysis = configure_hpo_term_analysis(hpo)
324324

325325
:func:`~gpsea.analysis.pcats.configure_hpo_term_analysis` configures the analysis
326-
that uses HPO MTC filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`) for selecting HPO terms of interest,
326+
that uses Independent filtering for HPO (:class:`~gpsea.analysis.mtc_filter.IfHpoFilter`) for selecting HPO terms of interest,
327327
Fisher Exact test for computing nominal p values, and Benjamini-Hochberg for multiple testing correction.
328328

329329

docs/user-guide/analyses/mtc.rst

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -171,31 +171,37 @@ we pass an iterable (e.g. a tuple) with these two terms as an argument:
171171
2
172172

173173

174-
.. _hpo-mt-filter:
174+
.. _hpo-if-filter:
175175

176-
HPO MT filter
177-
-------------
176+
Independent filtering for HPO
177+
-----------------------------
178+
179+
Independent filtering for HPO involves making several domain judgments
180+
and taking advantage of the HPO structure
181+
in order to reduce the number of HPO terms for testing.
182+
The filter's logic is made up of 8 individual heuristics
183+
to skip testing the terms that are unlikely to yield significant or interesting results (see below).
178184

179-
The HPO MT filter involves making several domain judgments and takes advantage of the HPO structure.
180-
The strategy needs access to HPO:
185+
Some of the heuristics need to access HPO hierarchy,
186+
so let's load HPO
181187

182188
>>> import hpotk
183189
>>> store = hpotk.configure_ontology_store()
184190
>>> hpo = store.load_minimal_hpo(release='v2024-07-01')
185191

186-
and it is implemented in the :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter` class:
192+
and let's create the :class:`~gpsea.analysis.mtc_filter.IfHpoFilter` class
193+
using the static constructor
194+
:func:`~gpsea.analysis.mtc_filter.IfHpoFilter.default_filter`:
187195

188-
>>> from gpsea.analysis.mtc_filter import HpoMtcFilter
189-
>>> hpo_mtc = HpoMtcFilter.default_filter(hpo=hpo)
196+
>>> from gpsea.analysis.mtc_filter import IfHpoFilter
197+
>>> hpo_mtc = IfHpoFilter.default_filter(hpo=hpo)
190198

191199

192-
We use static constructor :func:`~gpsea.analysis.mtc_filter.HpoMtcFilter.default_filter`
193-
for creating :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`.
194-
The constructor takes a ``term_frequency_threshold`` option (40% by default)
195-
and the method's logic is made up of 8 individual heuristics
196-
designed to skip testing the HPO terms that are unlikely to yield significant or interesting results.
200+
The constructor takes HPO and two thresholds (optional).
201+
See the API documentation and the explanations below for more details.
197202

198-
.. contents:: HPO MT filters
203+
204+
.. contents:: Independent filtering for HPO
199205
:depth: 1
200206
:local:
201207

@@ -281,6 +287,8 @@ that if there is a signal from the nervous system,
281287
it will lead to at least one of the descendents of
282288
*Abnormality of the nervous system* being significant.
283289

290+
See :ref:`general-hpo-terms` section for details.
291+
284292

285293
`HMF09` - Skipping terms that are rare on the cohort level
286294
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -294,8 +302,6 @@ and we have explicit observed observations for 20 and excluded for 10 individual
294302
then the annotation frequency is `0.3`.
295303

296304
The threshold is set as ``annotation_frequency_threshold`` option
297-
of the :func:`~gpsea.analysis.mtc_filter.HpoMtcFilter.default_filter` constructor,
305+
of the :func:`~gpsea.analysis.mtc_filter.IfHpoFilter.default_filter` constructor,
298306
with the default value of `0.4` (40%).
299307

300-
301-
See :ref:`general-hpo-terms` section for details.

docs/user-guide/analyses/phenotype-classes.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ a phenotype multiple testing (MT) filter and multiple testing correction (MTC).
207207

208208
Phenotype MT filter selects a (sub)set of HPO terms for testing,
209209
for instance only the user-selected terms (see :class:`~gpsea.analysis.mtc_filter.SpecifiedTermsMtcFilter`)
210-
or the terms selected by :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`.
210+
or the terms selected by :class:`~gpsea.analysis.mtc_filter.IfHpoFilter`.
211211

212212
MTC then adjusts the nominal p values for the increased risk
213213
of false positive G/P associations.
@@ -221,8 +221,8 @@ We must choose a phenotype MT filter as well as a MTC procedure to perform genot
221221
Default analysis
222222
^^^^^^^^^^^^^^^^
223223

224-
We recommend using HPO MT filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`) as a phenotype MT filter
225-
and Benjamini-Hochberg for MTC.
224+
We recommend using Independent filtering for HPO (:class:`~gpsea.analysis.mtc_filter.IfHpoFilter`)
225+
and Benjamini-Hochberg MT correction.
226226
The default analysis can be configured with :func:`~gpsea.analysis.pcats.configure_hpo_term_analysis` convenience method.
227227

228228
>>> from gpsea.analysis.pcats import configure_hpo_term_analysis
@@ -240,10 +240,10 @@ Custom analysis
240240
If the default selection of phenotype MT filter and multiple testing correction is not an option,
241241
we can configure the analysis manually.
242242

243-
First, we choose a phenotype MT filter (e.g. :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`):
243+
First, we choose a phenotype MT filter (e.g. :class:`~gpsea.analysis.mtc_filter.IfHpoFilter`):
244244

245-
>>> from gpsea.analysis.mtc_filter import HpoMtcFilter
246-
>>> mtc_filter = HpoMtcFilter.default_filter(hpo, term_frequency_threshold=.2)
245+
>>> from gpsea.analysis.mtc_filter import IfHpoFilter
246+
>>> mtc_filter = IfHpoFilter.default_filter(hpo, term_frequency_threshold=.2)
247247

248248
.. note::
249249

src/gpsea/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
GPSEA is a library for finding genotype-phenotype associations.
33
"""
44

5-
__version__ = "0.9.3"
5+
__version__ = "0.9.4"
66

77
_overwrite = False
88
"""

src/gpsea/analysis/mtc_filter/__init__.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,15 @@
66
"""
77

88
from ._impl import PhenotypeMtcFilter, PhenotypeMtcResult, PhenotypeMtcIssue
9-
from ._impl import UseAllTermsMtcFilter, SpecifiedTermsMtcFilter, HpoMtcFilter
9+
from ._impl import UseAllTermsMtcFilter, SpecifiedTermsMtcFilter, IfHpoFilter
10+
from ._impl import HpoMtcFilter
1011

1112
__all__ = [
12-
'PhenotypeMtcFilter', 'PhenotypeMtcResult', 'PhenotypeMtcIssue',
13-
'UseAllTermsMtcFilter', 'SpecifiedTermsMtcFilter', 'HpoMtcFilter',
13+
"PhenotypeMtcFilter",
14+
"PhenotypeMtcResult",
15+
"PhenotypeMtcIssue",
16+
"UseAllTermsMtcFilter",
17+
"SpecifiedTermsMtcFilter",
18+
"IfHpoFilter",
19+
"HpoMtcFilter",
1420
]

src/gpsea/analysis/mtc_filter/_impl.py

Lines changed: 91 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import typing
44

55
from collections import deque
6+
import warnings
67

78
import hpotk
89
import pandas as pd
@@ -252,14 +253,14 @@ def verify_term_id(val: typing.Union[str, hpotk.TermId]) -> hpotk.TermId:
252253
raise ValueError(f"{val} is neither `str` nor `hpotk.TermId`")
253254

254255

255-
class HpoMtcFilter(PhenotypeMtcFilter[hpotk.TermId]):
256+
class IfHpoFilter(PhenotypeMtcFilter[hpotk.TermId]):
256257
"""
257-
`HpoMtcFilter` decides which phenotypes should be tested and which phenotypes are not worth testing.
258+
`IfHpoFilter` decides which phenotypes should be tested and which phenotypes are not worth testing.
258259
259260
The class leverages a number of heuristics and domain decisions.
260-
See :ref:`hpo-mt-filter` section for more info.
261+
See :ref:`hpo-if-filter` section for more info.
261262
262-
We recommend creating an instance using the :func:`default_filter` static factory method.
263+
We recommend creating an instance using the :func:`~gpsea.analysis.mtc_filter.IfHpoFilter.default_filter` static factory method.
263264
"""
264265

265266
NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO = PhenotypeMtcResult.fail(
@@ -293,7 +294,7 @@ def default_filter(
293294
(e.g., 22% in missense and 3% in nonsense genotypes would be OK,
294295
but not 13% missense and 10% nonsense genotypes if the threshold is 0.2).
295296
The default threshold is `0.4` (40%).
296-
annotation_frequency_threshold: a `float` in range :math:`(0, 1) with the minimum frequency of
297+
annotation_frequency_threshold: a `float` in range :math:`(0, 1]` with the minimum frequency of
297298
annotation in the cohort. For instance, if the cohort consists of 100 individuals, and
298299
we have explicit observed observations for 20 and excluded for 10 individuals, then the
299300
annotation frequency is `0.3`. The purpose of this threshold is to omit terms for which
@@ -340,7 +341,7 @@ def default_filter(
340341
general_hpo_term_set.update(second_level_terms)
341342
general_hpo_term_set.update(third_level_terms)
342343

343-
return HpoMtcFilter(
344+
return IfHpoFilter(
344345
hpo=hpo,
345346
term_frequency_threshold=term_frequency_threshold,
346347
annotation_frequency_threshold=annotation_frequency_threshold,
@@ -355,7 +356,15 @@ def __init__(
355356
general_hpo_terms: typing.Iterable[hpotk.TermId],
356357
):
357358
self._hpo = hpo
359+
assert (
360+
isinstance(term_frequency_threshold, (int, float))
361+
and 0.0 < term_frequency_threshold <= 1.0
362+
), "The term_frequency_threshold must be in the range (0, 1]"
358363
self._hpo_term_frequency_filter = term_frequency_threshold
364+
assert (
365+
isinstance(annotation_frequency_threshold, (int, float))
366+
and 0.0 < annotation_frequency_threshold <= 1.0
367+
), "The annotation_frequency_threshold must be in the range (0, 1]"
359368
self._hpo_annotation_frequency_threshold = annotation_frequency_threshold
360369

361370
self._general_hpo_terms = set(general_hpo_terms)
@@ -423,17 +432,17 @@ def filter(
423432
continue
424433

425434
if term_id in self._general_hpo_terms:
426-
results[idx] = HpoMtcFilter.SKIPPING_GENERAL_TERM
435+
results[idx] = IfHpoFilter.SKIPPING_GENERAL_TERM
427436
continue
428437

429438
if not self._hpo.graph.is_ancestor_of(PHENOTYPIC_ABNORMALITY, term_id):
430-
results[idx] = HpoMtcFilter.SKIPPING_NON_PHENOTYPE_TERM
439+
results[idx] = IfHpoFilter.SKIPPING_NON_PHENOTYPE_TERM
431440
continue
432441

433442
ph_clf = pheno_clfs[idx]
434443
contingency_matrix = counts[idx]
435444

436-
max_freq = HpoMtcFilter.get_maximum_group_observed_HPO_frequency(
445+
max_freq = IfHpoFilter.get_maximum_group_observed_HPO_frequency(
437446
contingency_matrix,
438447
ph_clf=ph_clf,
439448
)
@@ -459,19 +468,19 @@ def filter(
459468
results[idx] = self._not_powered_for_2_by_3
460469
continue
461470

462-
if not HpoMtcFilter.some_cell_has_greater_than_one_count(
471+
if not IfHpoFilter.some_cell_has_greater_than_one_count(
463472
counts=contingency_matrix,
464473
ph_clf=ph_clf,
465474
):
466-
results[idx] = HpoMtcFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO
475+
results[idx] = IfHpoFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO
467476
continue
468477

469-
elif HpoMtcFilter.one_genotype_has_zero_hpo_observations(
478+
elif IfHpoFilter.one_genotype_has_zero_hpo_observations(
470479
counts=contingency_matrix,
471480
gt_clf=gt_clf,
472481
):
473482
results[idx] = (
474-
HpoMtcFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS
483+
IfHpoFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS
475484
)
476485
continue
477486

@@ -495,7 +504,7 @@ def filter(
495504
axis=None
496505
) < 1:
497506
# Do not test if the count is exactly the same to the counts in the only child term.
498-
results[idx] = HpoMtcFilter.SAME_COUNT_AS_THE_ONLY_CHILD
507+
results[idx] = IfHpoFilter.SAME_COUNT_AS_THE_ONLY_CHILD
499508
continue
500509

501510
# ##
@@ -520,18 +529,18 @@ def possible_results(self) -> typing.Collection[PhenotypeMtcResult]:
520529
return (
521530
PhenotypeMtcFilter.OK,
522531
self._below_frequency_threshold, # HMF01
523-
HpoMtcFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO, # HMF02
524-
HpoMtcFilter.SAME_COUNT_AS_THE_ONLY_CHILD, # HMF03
525-
HpoMtcFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS, # HMF05
532+
IfHpoFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO, # HMF02
533+
IfHpoFilter.SAME_COUNT_AS_THE_ONLY_CHILD, # HMF03
534+
IfHpoFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS, # HMF05
526535
self._not_powered_for_2_by_2, # HMF06
527536
self._not_powered_for_2_by_3, # HMF06
528-
HpoMtcFilter.SKIPPING_NON_PHENOTYPE_TERM, # HMF07
529-
HpoMtcFilter.SKIPPING_GENERAL_TERM, # HMF08
537+
IfHpoFilter.SKIPPING_NON_PHENOTYPE_TERM, # HMF07
538+
IfHpoFilter.SKIPPING_GENERAL_TERM, # HMF08
530539
self._below_annotation_frequency_threshold, # HMF09
531540
)
532541

533542
def filter_method_name(self) -> str:
534-
return "HPO MTC filter"
543+
return "Independent filtering HPO filter"
535544

536545
@staticmethod
537546
def get_number_of_observed_hpo_observations(
@@ -623,3 +632,65 @@ def _get_ordered_terms(
623632

624633
# now, ordered_term_list is ordered from leaves to root
625634
return ordered_term_list
635+
636+
637+
class HpoMtcFilter(IfHpoFilter):
638+
"""
639+
`HpoMtcFilter` is deprecated and will be removed in `1.0.0`.
640+
641+
Use :class:`gpsea.analysis.mtc_filter.IfHpoFilter` instead.
642+
"""
643+
644+
@staticmethod
645+
def default_filter(
646+
hpo: hpotk.MinimalOntology,
647+
term_frequency_threshold: float = 0.4,
648+
annotation_frequency_threshold: float = 0.4,
649+
phenotypic_abnormality: hpotk.TermId = PHENOTYPIC_ABNORMALITY,
650+
):
651+
"""
652+
Args:
653+
hpo: HPO
654+
term_frequency_threshold: a `float` in range :math:`(0, 1]` with the minimum frequency
655+
for an HPO term to have in at least one of the genotype groups
656+
(e.g., 22% in missense and 3% in nonsense genotypes would be OK,
657+
but not 13% missense and 10% nonsense genotypes if the threshold is 0.2).
658+
The default threshold is `0.4` (40%).
659+
annotation_frequency_threshold: a `float` in range :math:`(0, 1]` with the minimum frequency of
660+
annotation in the cohort. For instance, if the cohort consists of 100 individuals, and
661+
we have explicit observed observations for 20 and excluded for 10 individuals, then the
662+
annotation frequency is `0.3`. The purpose of this threshold is to omit terms for which
663+
we simply do not have much data overall. By default, we set a threshold to `0.4` (40%).
664+
phenotypic_abnormality: a :class:`~hpotk.TermId` corresponding to the root of HPO phenotype hierarchy.
665+
Having to specify this option should be very rarely, if ever.
666+
"""
667+
warnings.warn(
668+
"HpoMtcFilter has been deprecated and will be removed in 1.0.0. Use `IfHpoFilter` instead.",
669+
DeprecationWarning,
670+
stacklevel=2,
671+
)
672+
IfHpoFilter.default_filter(
673+
hpo=hpo,
674+
term_frequency_threshold=term_frequency_threshold,
675+
annotation_frequency_threshold=annotation_frequency_threshold,
676+
phenotypic_abnormality=phenotypic_abnormality,
677+
)
678+
679+
def __init__(
680+
self,
681+
hpo: hpotk.MinimalOntology,
682+
term_frequency_threshold: float,
683+
annotation_frequency_threshold: float,
684+
general_hpo_terms: typing.Iterable[hpotk.TermId],
685+
):
686+
super().__init__(
687+
hpo,
688+
term_frequency_threshold,
689+
annotation_frequency_threshold,
690+
general_hpo_terms,
691+
)
692+
warnings.warn(
693+
"HpoMtcFilter has been deprecated and will be removed in 1.0.0. Use `IfHpoFilter` instead.",
694+
DeprecationWarning,
695+
stacklevel=2,
696+
)

src/gpsea/analysis/pcats/_config.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import hpotk
44

5-
from ..mtc_filter import HpoMtcFilter
5+
from ..mtc_filter import IfHpoFilter
66
from ._impl import HpoTermAnalysis
77
from .stats import CountStatistic, FisherExactTest
88

@@ -16,13 +16,13 @@ def configure_hpo_term_analysis(
1616
"""
1717
Configure HPO term analysis with default parameters.
1818
19-
The default analysis will pre-filter HPO terms with :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`,
19+
The default analysis will pre-filter HPO terms with :class:`~gpsea.analysis.mtc_filter.IfHpoFilter`,
2020
then compute nominal p values using `count_statistic` (default Fisher exact test),
2121
and apply multiple testing correction (default Benjamini/Hochberg (`fdr_bh`))
2222
with target `mtc_alpha` (default `0.05`).
2323
"""
2424
return HpoTermAnalysis(
25-
mtc_filter=HpoMtcFilter.default_filter(hpo),
25+
mtc_filter=IfHpoFilter.default_filter(hpo),
2626
count_statistic=count_statistic,
2727
mtc_correction=mtc_correction,
2828
mtc_alpha=mtc_alpha,

0 commit comments

Comments
 (0)