Skip to content

Commit 83e94b4

Browse files
authored
Merge pull request #400 from monarch-initiative/release
Release
2 parents e14538f + 48abc8b commit 83e94b4

35 files changed

+530
-114
lines changed

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,9 @@
55

66
GPSEA (Genotypes and Phenotypes - Statistical Evaluation of Associations) is a Python package for finding genotype-phenotype associations.
77

8-
9-
See the [Tutorial](https://monarch-initiative.github.io/gpsea/stable/tutorial.html)
10-
and a comprehensive [User guide](https://monarch-initiative.github.io/gpsea/stable/user-guide/index.html)
11-
for more information.
8+
See our documentation for the [setup](https://monarch-initiative.github.io/gpsea/stable/setup.html) instructions,
9+
a [tutorial](https://monarch-initiative.github.io/gpsea/stable/tutorial.html) with an end-to-end genotype-phenotype association analysis,
10+
and a comprehensive [user guide](https://monarch-initiative.github.io/gpsea/stable/user-guide/index.html) with everything else.
1211

1312
The documentation comes in two flavors:
1413

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
# The short X.Y version.
6464
version = u'0.9'
6565
# The full version, including alpha/beta/rc tags.
66-
release = u'0.9.1'
66+
release = u'0.9.2'
6767

6868
# The language for content autogenerated by Sphinx. Refer to documentation
6969
# for a list of supported languages.
-1.56 KB
Loading

docs/setup.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
Setup
55
#####
66

7-
Here we show how to install GPSEA and to prepare your Python environment
8-
for genotype-phenotype association analysis.
7+
Here we show how to install GPSEA and prepare your Python environment
8+
for genotype-phenotype association analyses.
99

1010

1111
.. contents:: Table of Contents

docs/tutorial.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,10 +273,14 @@ depending on presence of a single allele of a missense or truncating variant
273273
>>> from gpsea.analysis.clf import monoallelic_classifier
274274
>>> is_missense = variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id)
275275
>>> truncating_effects = (
276+
... VariantEffect.TRANSCRIPT_ABLATION,
277+
... VariantEffect.TRANSCRIPT_TRANSLOCATION,
276278
... VariantEffect.FRAMESHIFT_VARIANT,
279+
... VariantEffect.START_LOST,
277280
... VariantEffect.STOP_GAINED,
278281
... VariantEffect.SPLICE_DONOR_VARIANT,
279282
... VariantEffect.SPLICE_ACCEPTOR_VARIANT,
283+
... # more effects could be listed here ...
280284
... )
281285
>>> is_truncating = anyof(variant_effect(e, tx_id) for e in truncating_effects)
282286
>>> gt_clf = monoallelic_classifier(

docs/user-guide/analyses/partitioning/genotype/variant_predicates.rst

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,11 @@ The predicates operate on several lines of information:
2626
+------------------------+-------------------------------------------------------------------------------------------------+
2727
| Protein data | variant is located in a region encoding a protein domain, protein feature type |
2828
+------------------------+-------------------------------------------------------------------------------------------------+
29-
| Genome | overlap with a genomic region of interest |
30-
+------------------------+-------------------------------------------------------------------------------------------------+
3129

3230

3331
The scope of the builtin predicates is fairly narrow
3432
and likely insufficient for real-life analyses.
35-
However, the predicates can be chained into a compound predicate
33+
However, several predicates can be "chained" into a compound predicate using a boolean logic,
3634
to achive more expressivity for testing complex conditions,
3735
such as "variant is a missense or synonymous variant located in exon 6 of `NM_013275.6`".
3836

@@ -41,8 +39,9 @@ such as "variant is a missense or synonymous variant located in exon 6 of `NM_01
4139
Examples
4240
********
4341

44-
Here we show examples of several simple variant predicates and
45-
how to chain them for testing complex conditions.
42+
Here we show how to use the builtin predicates for simple tests
43+
and how to build a compound predicate from the builtin predicates,
44+
for testing complex conditions.
4645

4746

4847
Load cohort
@@ -112,10 +111,10 @@ See the :mod:`gpsea.analysis.predicate` module
112111
for a complete list of the builtin predicates.
113112

114113

115-
Predicate chain
116-
===============
114+
Compound predicates
115+
===================
117116

118-
Using the builtin predicates, we can build a logical chain to test complex conditions.
117+
A compound predicate for testing complex conditions can be built from two or more predicates.
119118
For instance, we can test if the variant meets any of several conditions:
120119

121120
>>> import gpsea.analysis.predicate as vp
@@ -130,7 +129,13 @@ or *all* conditions:
130129
>>> missense_and_exon20.test(variant)
131130
True
132131

133-
All variant predicates overload Python ``&`` (AND) and ``|`` (OR) operators, to allow chaining.
132+
All variant predicates overload Python ``&`` (AND) and ``|`` (OR) operators,
133+
to combine a predicate pair into a compound predicate.
134+
135+
.. note::
136+
137+
Combining three or or more predicates can be achieved with :func:`~gpsea.analysis.allof`
138+
and :func:`~gpsea.analysis.anyof` functions.
134139

135140
Therefore, there is nothing that prevents us to combine the predicates into multi-level tests,
136141
e.g. to test if the variant is a *"chromosomal deletion" or a deletion which removes at least 50 bp*:
@@ -180,12 +185,16 @@ The builtin predicates should cover majority of use cases.
180185
However, if a predicate seems to be missing,
181186
feel free to submit an issue in our
182187
`GitHub tracker <https://github.com/monarch-initiative/gpsea/issues>`_,
183-
or to implement a custom predicate
184-
by extending the :class:`~gpsea.analysis.predicate.VariantPredicate` class 😎.
188+
or implement your own predicate by following the :ref:`custom-variant-predicate`
189+
guide.
185190

186191

192+
****
193+
Next
194+
****
187195

188196
The variant predicate offers a flexible API for testing if variants meet a condition.
189-
However, the genotype phenotype correlations are done on the individual level
190-
and the variant predicates are used as a component of the genotype predicate.
191-
The next sections show how to use variant predicates to assign individuals into groups.
197+
However, the genotype phenotype correlations are studied on the level of individuals.
198+
As described in :ref:`genotype-classifiers`, GPSEA uses the :class:`~gpsea.analysis.clf.GenotypeClassifier` API
199+
to assign individuals into non-overlapping classes. Variant predicates are essential for creating such classifier.
200+
We explain the details in the following sections.

docs/user-guide/analyses/phenotype-scores.rst

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,17 +121,19 @@ In this example, the point mutation is a mutation that meets the following condi
121121
'((change length == 0 AND reference allele length == 1) AND MISSENSE_VARIANT on NM_001042681.2)'
122122

123123

124-
For the loss of function predicate, the following variant effects are considered loss of function:
124+
For the loss-of-function predicate, the following is a non-exhausting list
125+
of variant effects considered as a loss-of-function:
125126

126127
>>> lof_effects = (
128+
... VariantEffect.TRANSCRIPT_TRANSLOCATION,
127129
... VariantEffect.TRANSCRIPT_ABLATION,
128130
... VariantEffect.FRAMESHIFT_VARIANT,
129131
... VariantEffect.START_LOST,
130132
... VariantEffect.STOP_GAINED,
131133
... )
132134
>>> lof_mutation = anyof(variant_effect(eff, tx_id) for eff in lof_effects)
133135
>>> lof_mutation.description
134-
'(TRANSCRIPT_ABLATION on NM_001042681.2 OR FRAMESHIFT_VARIANT on NM_001042681.2 OR START_LOST on NM_001042681.2 OR STOP_GAINED on NM_001042681.2)'
136+
'(TRANSCRIPT_TRANSLOCATION on NM_001042681.2 OR TRANSCRIPT_ABLATION on NM_001042681.2 OR FRAMESHIFT_VARIANT on NM_001042681.2 OR START_LOST on NM_001042681.2 OR STOP_GAINED on NM_001042681.2)'
135137

136138

137139
The genotype predicate will bin the patient into two classes: a point mutation or the loss of function:
@@ -154,6 +156,26 @@ Phenotype score
154156
This component is responsible for computing a phenotype score for an individual.
155157
As far as GPSEA framework is concerned, the phenotype score must be a floating point number
156158
or a `NaN` value if the score cannot be computed for an individual.
159+
This is the essence of the :class:`~gpsea.analysis.pscore.PhenotypeScorer` class.
160+
161+
GPSEA ships with several builtin phenotype scorers which can be used as
162+
163+
+------------------------------------------------------------+---------------------------------------------+
164+
| Name | Description |
165+
+============================================================+=============================================+
166+
| | Compute the total number of occurrences |
167+
| * :class:`~gpsea.analysis.pscore.CountingPhenotypeScorer` | of specific phenotypic features |
168+
| | (used in this section) |
169+
+------------------------------------------------------------+---------------------------------------------+
170+
| | Compute the "adapted De Vries Score" |
171+
| * :class:`~gpsea.analysis.pscore.DeVriesPhenotypeScorer` | for assessing severity |
172+
| | of intellectual disability |
173+
+------------------------------------------------------------+---------------------------------------------+
174+
175+
.. tip::
176+
177+
See :ref:`custom-phenotype-scorer` section to learn how to build a phenotype scorer from scratch.
178+
157179

158180
Here we use the :class:`~gpsea.analysis.pscore.CountingPhenotypeScorer` for scoring
159181
the individuals based on the number of structural defects
-6 Bytes
Loading

docs/user-guide/analyses/survival.rst

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,3 +188,96 @@ or `None` if computing the survival was impossible (see :func:`~gpsea.analysis.t
188188
The `Survival` reports the number of days until attaining the endpoint,
189189
here defined as end stage renal disease (`is_censored=False`),
190190
or until the individual dropped out of the analysis (`is_censored=True`).
191+
192+
193+
Troubleshooting
194+
===============
195+
196+
Sometimes the survival analysis fails and an :class:`~gpsea.analysis.AnalysisException` is raised.
197+
For instance, the current Logrank test implementation reports a p value of `NaN`
198+
if the survival is the same for all individuals.
199+
This is unlikely an expected outcome, therefore GPSEA raises
200+
an :class:`~gpsea.analysis.AnalysisException` to force the user to troubleshoot.
201+
202+
To help with troubleshooting, the data computed prior detecting the error is included in the exception's
203+
:attr:`~gpsea.analysis.AnalysisException.data` attribute. In survival analysis, the data should include
204+
the identifiers, genotype classes, and survivals of the tested individuals.
205+
206+
Let's show this on an example. We will create a toy cohort of 10 individuals
207+
with onset of `Lynch syndrome I <https://hpo.jax.org/browse/disease/OMIM:120435>`_
208+
(`OMIM:120435`) at 40 years.
209+
210+
>>> from gpsea.model import Cohort, Patient, Disease, Age
211+
>>> onset = Age.from_iso8601_period("P40Y")
212+
>>> individuals = [
213+
... Patient.from_raw_parts(
214+
... labels=label,
215+
... diseases=(
216+
... Disease.from_raw_parts(
217+
... term_id="OMIM:120435",
218+
... name="Lynch syndrome I",
219+
... is_observed=True,
220+
... onset=onset,
221+
... ),
222+
... ),
223+
... )
224+
... for label in "ABCDEFGHIJ" # 10 individuals
225+
... ]
226+
>>> cohort = Cohort.from_patients(individuals)
227+
228+
We will assign them into genotype classes on random, ...
229+
230+
>>> from gpsea.analysis.clf import random_classifier
231+
>>> gt_clf = random_classifier(seed=123)
232+
>>> gt_clf.description
233+
'Classify the individual into random classes'
234+
235+
... using the Lynch syndrome I diagnosis as the endpoint ...
236+
237+
>>> from gpsea.analysis.temporal.endpoint import disease_onset
238+
>>> endpoint = disease_onset(disease_id="OMIM:120435")
239+
>>> endpoint.description
240+
'Compute time until OMIM:120435 onset'
241+
242+
... and we will use Logrank test for differences in survival.
243+
244+
>>> from gpsea.analysis.temporal.stats import LogRankTest
245+
>>> survival_statistic = LogRankTest()
246+
247+
We put together the survival analysis ...
248+
249+
>>> from gpsea.analysis.temporal import SurvivalAnalysis
250+
>>> survival_analysis = SurvivalAnalysis(
251+
... statistic=survival_statistic,
252+
... )
253+
254+
... which we expect to fail with an :class:`~gpsea.analysis.AnalysisException`:
255+
256+
>>> result = survival_analysis.compare_genotype_vs_survival(
257+
... cohort=cohort,
258+
... gt_clf=gt_clf,
259+
... endpoint=endpoint,
260+
... )
261+
Traceback (most recent call last):
262+
...
263+
gpsea.analysis._base.AnalysisException: The survival values did not meet the expectation of the statistical test!
264+
265+
The genotype classes and survival values can be retrieved from the exception:
266+
267+
>>> from gpsea.analysis import AnalysisException
268+
>>> try:
269+
... result = survival_analysis.compare_genotype_vs_survival(
270+
... cohort=cohort,
271+
... gt_clf=gt_clf,
272+
... endpoint=endpoint,
273+
... )
274+
... except AnalysisException as ae:
275+
... genotypes = ae.data["genotype"]
276+
... survivals = ae.data["survival"]
277+
278+
and the values can come in handy in troubleshooting:
279+
280+
>>> genotypes[:3]
281+
(0, 0, 0)
282+
>>> survivals[:3]
283+
(Survival(value=14610.0, is_censored=False), Survival(value=14610.0, is_censored=False), Survival(value=14610.0, is_censored=False))

0 commit comments

Comments
 (0)