@@ -188,3 +188,96 @@ or `None` if computing the survival was impossible (see :func:`~gpsea.analysis.t
188188The `Survival ` reports the number of days until attaining the endpoint,
189189here defined as end stage renal disease (`is_censored=False `),
190190or until the individual dropped out of the analysis (`is_censored=True `).
191+
192+
193+ Troubleshooting
194+ ===============
195+
196+ Sometimes the survival analysis fails and an :class: `~gpsea.analysis.AnalysisException ` is raised.
197+ For instance, the current Logrank test implementation reports a p value of `NaN `
198+ if the survival is the same for all individuals.
199+ This is unlikely an expected outcome, therefore GPSEA raises
200+ an :class: `~gpsea.analysis.AnalysisException ` to force the user to troubleshoot.
201+
202+ To help with troubleshooting, the data computed prior detecting the error is included in the exception's
203+ :attr: `~gpsea.analysis.AnalysisException.data ` attribute. In survival analysis, the data should include
204+ the identifiers, genotype classes, and survivals of the tested individuals.
205+
206+ Let's show this on an example. We will create a toy cohort of 10 individuals
207+ with onset of `Lynch syndrome I <https://hpo.jax.org/browse/disease/OMIM:120435 >`_
208+ (`OMIM:120435 `) at 40 years.
209+
210+ >>> from gpsea.model import Cohort, Patient, Disease, Age
211+ >>> onset = Age.from_iso8601_period(" P40Y" )
212+ >>> individuals = [
213+ ... Patient.from_raw_parts(
214+ ... labels= label,
215+ ... diseases= (
216+ ... Disease.from_raw_parts(
217+ ... term_id= " OMIM:120435" ,
218+ ... name= " Lynch syndrome I" ,
219+ ... is_observed= True ,
220+ ... onset= onset,
221+ ... ),
222+ ... ),
223+ ... )
224+ ... for label in " ABCDEFGHIJ" # 10 individuals
225+ ... ]
226+ >>> cohort = Cohort.from_patients(individuals)
227+
228+ We will assign them into genotype classes on random, ...
229+
230+ >>> from gpsea.analysis.clf import random_classifier
231+ >>> gt_clf = random_classifier(seed = 123 )
232+ >>> gt_clf.description
233+ 'Classify the individual into random classes'
234+
235+ ... using the Lynch syndrome I diagnosis as the endpoint ...
236+
237+ >>> from gpsea.analysis.temporal.endpoint import disease_onset
238+ >>> endpoint = disease_onset(disease_id = " OMIM:120435" )
239+ >>> endpoint.description
240+ 'Compute time until OMIM:120435 onset'
241+
242+ ... and we will use Logrank test for differences in survival.
243+
244+ >>> from gpsea.analysis.temporal.stats import LogRankTest
245+ >>> survival_statistic = LogRankTest()
246+
247+ We put together the survival analysis ...
248+
249+ >>> from gpsea.analysis.temporal import SurvivalAnalysis
250+ >>> survival_analysis = SurvivalAnalysis(
251+ ... statistic= survival_statistic,
252+ ... )
253+
254+ ... which we expect to fail with an :class: `~gpsea.analysis.AnalysisException `:
255+
256+ >>> result = survival_analysis.compare_genotype_vs_survival(
257+ ... cohort= cohort,
258+ ... gt_clf= gt_clf,
259+ ... endpoint= endpoint,
260+ ... )
261+ Traceback (most recent call last):
262+ ...
263+ gpsea.analysis._base.AnalysisException: The survival values did not meet the expectation of the statistical test!
264+
265+ The genotype classes and survival values can be retrieved from the exception:
266+
267+ >>> from gpsea.analysis import AnalysisException
268+ >>> try :
269+ ... result = survival_analysis.compare_genotype_vs_survival(
270+ ... cohort= cohort,
271+ ... gt_clf= gt_clf,
272+ ... endpoint= endpoint,
273+ ... )
274+ ... except AnalysisException as ae:
275+ ... genotypes = ae.data[" genotype" ]
276+ ... survivals = ae.data[" survival" ]
277+
278+ and the values can come in handy in troubleshooting:
279+
280+ >>> genotypes[:3 ]
281+ (0, 0, 0)
282+ >>> survivals[:3 ]
283+ (Survival(value=14610.0, is_censored=False), Survival(value=14610.0, is_censored=False), Survival(value=14610.0, is_censored=False))
0 commit comments