Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
1e31d7f
Next development iteration `v0.9.2-dev0`.
ielis Jan 7, 2025
b06338a
Fix lints and typing annotations.
ielis Jan 9, 2025
6d4f65a
Merge pull request #390 from monarch-initiative/minor-lint-fixes
ielis Jan 14, 2025
042df93
Merge branch 'develop' into add-support-for-translocations
ielis Jan 14, 2025
43edff3
Add an example translocation into a test phenopacket.
ielis Jan 14, 2025
3b0cb67
The fixtures' scope should be class.
ielis Jan 14, 2025
b18bb50
Predict that an imprecise translocation leads to a transcript translo…
ielis Jan 14, 2025
d571a57
Reorder `VariantEffect` enum members.
ielis Jan 14, 2025
5b6c81b
Show the transcript translocation in the docs.
ielis Jan 14, 2025
a9574ad
Merge pull request #391 from monarch-initiative/add-support-for-trans…
ielis Jan 14, 2025
e849d7c
Delete dead docs stub.
ielis Jan 14, 2025
dd3dcc4
Merge pull request #392 from monarch-initiative/delete-ar-docs
ielis Jan 14, 2025
291fb75
Improve error message when age cannot be parsed #393.
ielis Jan 15, 2025
0da3955
The median line color can be tweaked in phenotype score analysis boxp…
ielis Jan 15, 2025
8822bc1
The boxplot median line is black by default #397
ielis Jan 15, 2025
0607322
Add guide for writing custom components #396
ielis Jan 15, 2025
3699898
Add guide for writing custom `VariantPredicate` #326
ielis Jan 15, 2025
ba0487d
Include setup link in `README.md`.
ielis Jan 15, 2025
bf691ef
Add missing whitespace #388
ielis Jan 15, 2025
d0f69d0
Merge pull request #398 from monarch-initiative/fix-bugs
ielis Jan 15, 2025
36acd3a
Report partial data upon survival stats failure.
ielis Jan 16, 2025
bed8813
Add `random_classifier`.
ielis Jan 16, 2025
2721f33
Show how to troubleshoot survival analysis.
ielis Jan 16, 2025
5518116
Merge pull request #399 from monarch-initiative/report-partial-data-f…
ielis Jan 16, 2025
d8faa2d
Update doc data.
ielis Jan 16, 2025
48abc8b
Make release `0.9.2`.
ielis Jan 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@

GPSEA (Genotypes and Phenotypes - Statistical Evaluation of Associations) is a Python package for finding genotype-phenotype associations.


See the [Tutorial](https://monarch-initiative.github.io/gpsea/stable/tutorial.html)
and a comprehensive [User guide](https://monarch-initiative.github.io/gpsea/stable/user-guide/index.html)
for more information.
See our documentation for the [setup](https://monarch-initiative.github.io/gpsea/stable/setup.html) instructions,
a [tutorial](https://monarch-initiative.github.io/gpsea/stable/tutorial.html) with an end-to-end genotype-phenotype association analysis,
and a comprehensive [user guide](https://monarch-initiative.github.io/gpsea/stable/user-guide/index.html) with everything else.

The documentation comes in two flavors:

Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
# The short X.Y version.
version = u'0.9'
# The full version, including alpha/beta/rc tags.
release = u'0.9.1'
release = u'0.9.2'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
Binary file modified docs/img/tutorial/tbx5_protein_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
Setup
#####

Here we show how to install GPSEA and to prepare your Python environment
for genotype-phenotype association analysis.
Here we show how to install GPSEA and prepare your Python environment
for genotype-phenotype association analyses.


.. contents:: Table of Contents
Expand Down
4 changes: 4 additions & 0 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -273,10 +273,14 @@ depending on presence of a single allele of a missense or truncating variant
>>> from gpsea.analysis.clf import monoallelic_classifier
>>> is_missense = variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id)
>>> truncating_effects = (
... VariantEffect.TRANSCRIPT_ABLATION,
... VariantEffect.TRANSCRIPT_TRANSLOCATION,
... VariantEffect.FRAMESHIFT_VARIANT,
... VariantEffect.START_LOST,
... VariantEffect.STOP_GAINED,
... VariantEffect.SPLICE_DONOR_VARIANT,
... VariantEffect.SPLICE_ACCEPTOR_VARIANT,
... # more effects could be listed here ...
... )
>>> is_truncating = anyof(variant_effect(e, tx_id) for e in truncating_effects)
>>> gt_clf = monoallelic_classifier(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,11 @@ The predicates operate on several lines of information:
+------------------------+-------------------------------------------------------------------------------------------------+
| Protein data | variant is located in a region encoding a protein domain, protein feature type |
+------------------------+-------------------------------------------------------------------------------------------------+
| Genome | overlap with a genomic region of interest |
+------------------------+-------------------------------------------------------------------------------------------------+


The scope of the builtin predicates is fairly narrow
and likely insufficient for real-life analyses.
However, the predicates can be chained into a compound predicate
However, several predicates can be "chained" into a compound predicate using a boolean logic,
to achive more expressivity for testing complex conditions,
such as "variant is a missense or synonymous variant located in exon 6 of `NM_013275.6`".

Expand All @@ -41,8 +39,9 @@ such as "variant is a missense or synonymous variant located in exon 6 of `NM_01
Examples
********

Here we show examples of several simple variant predicates and
how to chain them for testing complex conditions.
Here we show how to use the builtin predicates for simple tests
and how to build a compound predicate from the builtin predicates,
for testing complex conditions.


Load cohort
Expand Down Expand Up @@ -112,10 +111,10 @@ See the :mod:`gpsea.analysis.predicate` module
for a complete list of the builtin predicates.


Predicate chain
===============
Compound predicates
===================

Using the builtin predicates, we can build a logical chain to test complex conditions.
A compound predicate for testing complex conditions can be built from two or more predicates.
For instance, we can test if the variant meets any of several conditions:

>>> import gpsea.analysis.predicate as vp
Expand All @@ -130,7 +129,13 @@ or *all* conditions:
>>> missense_and_exon20.test(variant)
True

All variant predicates overload Python ``&`` (AND) and ``|`` (OR) operators, to allow chaining.
All variant predicates overload Python ``&`` (AND) and ``|`` (OR) operators,
to combine a predicate pair into a compound predicate.

.. note::

Combining three or or more predicates can be achieved with :func:`~gpsea.analysis.allof`
and :func:`~gpsea.analysis.anyof` functions.

Therefore, there is nothing that prevents us to combine the predicates into multi-level tests,
e.g. to test if the variant is a *"chromosomal deletion" or a deletion which removes at least 50 bp*:
Expand Down Expand Up @@ -180,12 +185,16 @@ The builtin predicates should cover majority of use cases.
However, if a predicate seems to be missing,
feel free to submit an issue in our
`GitHub tracker <https://github.com/monarch-initiative/gpsea/issues>`_,
or to implement a custom predicate
by extending the :class:`~gpsea.analysis.predicate.VariantPredicate` class 😎.
or implement your own predicate by following the :ref:`custom-variant-predicate`
guide.


****
Next
****

The variant predicate offers a flexible API for testing if variants meet a condition.
However, the genotype phenotype correlations are done on the individual level
and the variant predicates are used as a component of the genotype predicate.
The next sections show how to use variant predicates to assign individuals into groups.
However, the genotype phenotype correlations are studied on the level of individuals.
As described in :ref:`genotype-classifiers`, GPSEA uses the :class:`~gpsea.analysis.clf.GenotypeClassifier` API
to assign individuals into non-overlapping classes. Variant predicates are essential for creating such classifier.
We explain the details in the following sections.
26 changes: 24 additions & 2 deletions docs/user-guide/analyses/phenotype-scores.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,17 +121,19 @@ In this example, the point mutation is a mutation that meets the following condi
'((change length == 0 AND reference allele length == 1) AND MISSENSE_VARIANT on NM_001042681.2)'


For the loss of function predicate, the following variant effects are considered loss of function:
For the loss-of-function predicate, the following is a non-exhausting list
of variant effects considered as a loss-of-function:

>>> lof_effects = (
... VariantEffect.TRANSCRIPT_TRANSLOCATION,
... VariantEffect.TRANSCRIPT_ABLATION,
... VariantEffect.FRAMESHIFT_VARIANT,
... VariantEffect.START_LOST,
... VariantEffect.STOP_GAINED,
... )
>>> lof_mutation = anyof(variant_effect(eff, tx_id) for eff in lof_effects)
>>> lof_mutation.description
'(TRANSCRIPT_ABLATION on NM_001042681.2 OR FRAMESHIFT_VARIANT on NM_001042681.2 OR START_LOST on NM_001042681.2 OR STOP_GAINED on NM_001042681.2)'
'(TRANSCRIPT_TRANSLOCATION on NM_001042681.2 OR TRANSCRIPT_ABLATION on NM_001042681.2 OR FRAMESHIFT_VARIANT on NM_001042681.2 OR START_LOST on NM_001042681.2 OR STOP_GAINED on NM_001042681.2)'


The genotype predicate will bin the patient into two classes: a point mutation or the loss of function:
Expand All @@ -154,6 +156,26 @@ Phenotype score
This component is responsible for computing a phenotype score for an individual.
As far as GPSEA framework is concerned, the phenotype score must be a floating point number
or a `NaN` value if the score cannot be computed for an individual.
This is the essence of the :class:`~gpsea.analysis.pscore.PhenotypeScorer` class.

GPSEA ships with several builtin phenotype scorers which can be used as

+------------------------------------------------------------+---------------------------------------------+
| Name | Description |
+============================================================+=============================================+
| | Compute the total number of occurrences |
| * :class:`~gpsea.analysis.pscore.CountingPhenotypeScorer` | of specific phenotypic features |
| | (used in this section) |
+------------------------------------------------------------+---------------------------------------------+
| | Compute the "adapted De Vries Score" |
| * :class:`~gpsea.analysis.pscore.DeVriesPhenotypeScorer` | for assessing severity |
| | of intellectual disability |
+------------------------------------------------------------+---------------------------------------------+

.. tip::

See :ref:`custom-phenotype-scorer` section to learn how to build a phenotype scorer from scratch.


Here we use the :class:`~gpsea.analysis.pscore.CountingPhenotypeScorer` for scoring
the individuals based on the number of structural defects
Expand Down
Binary file modified docs/user-guide/analyses/report/rere_phenotype_score_boxplot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
93 changes: 93 additions & 0 deletions docs/user-guide/analyses/survival.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,3 +188,96 @@ or `None` if computing the survival was impossible (see :func:`~gpsea.analysis.t
The `Survival` reports the number of days until attaining the endpoint,
here defined as end stage renal disease (`is_censored=False`),
or until the individual dropped out of the analysis (`is_censored=True`).


Troubleshooting
===============

Sometimes the survival analysis fails and an :class:`~gpsea.analysis.AnalysisException` is raised.
For instance, the current Logrank test implementation reports a p value of `NaN`
if the survival is the same for all individuals.
This is unlikely an expected outcome, therefore GPSEA raises
an :class:`~gpsea.analysis.AnalysisException` to force the user to troubleshoot.

To help with troubleshooting, the data computed prior detecting the error is included in the exception's
:attr:`~gpsea.analysis.AnalysisException.data` attribute. In survival analysis, the data should include
the identifiers, genotype classes, and survivals of the tested individuals.

Let's show this on an example. We will create a toy cohort of 10 individuals
with onset of `Lynch syndrome I <https://hpo.jax.org/browse/disease/OMIM:120435>`_
(`OMIM:120435`) at 40 years.

>>> from gpsea.model import Cohort, Patient, Disease, Age
>>> onset = Age.from_iso8601_period("P40Y")
>>> individuals = [
... Patient.from_raw_parts(
... labels=label,
... diseases=(
... Disease.from_raw_parts(
... term_id="OMIM:120435",
... name="Lynch syndrome I",
... is_observed=True,
... onset=onset,
... ),
... ),
... )
... for label in "ABCDEFGHIJ" # 10 individuals
... ]
>>> cohort = Cohort.from_patients(individuals)

We will assign them into genotype classes on random, ...

>>> from gpsea.analysis.clf import random_classifier
>>> gt_clf = random_classifier(seed=123)
>>> gt_clf.description
'Classify the individual into random classes'

... using the Lynch syndrome I diagnosis as the endpoint ...

>>> from gpsea.analysis.temporal.endpoint import disease_onset
>>> endpoint = disease_onset(disease_id="OMIM:120435")
>>> endpoint.description
'Compute time until OMIM:120435 onset'

... and we will use Logrank test for differences in survival.

>>> from gpsea.analysis.temporal.stats import LogRankTest
>>> survival_statistic = LogRankTest()

We put together the survival analysis ...

>>> from gpsea.analysis.temporal import SurvivalAnalysis
>>> survival_analysis = SurvivalAnalysis(
... statistic=survival_statistic,
... )

... which we expect to fail with an :class:`~gpsea.analysis.AnalysisException`:

>>> result = survival_analysis.compare_genotype_vs_survival(
... cohort=cohort,
... gt_clf=gt_clf,
... endpoint=endpoint,
... )
Traceback (most recent call last):
...
gpsea.analysis._base.AnalysisException: The survival values did not meet the expectation of the statistical test!

The genotype classes and survival values can be retrieved from the exception:

>>> from gpsea.analysis import AnalysisException
>>> try:
... result = survival_analysis.compare_genotype_vs_survival(
... cohort=cohort,
... gt_clf=gt_clf,
... endpoint=endpoint,
... )
... except AnalysisException as ae:
... genotypes = ae.data["genotype"]
... survivals = ae.data["survival"]

and the values can come in handy in troubleshooting:

>>> genotypes[:3]
(0, 0, 0)
>>> survivals[:3]
(Survival(value=14610.0, is_censored=False), Survival(value=14610.0, is_censored=False), Survival(value=14610.0, is_censored=False))
Loading
Loading