topFeatures(): support alternative DDoF calculation methods #71
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
At the moment
topFeatures()uses posterior-adjusted residual degree-of-freedom (getDfPosterior()) for the p-value calculation of fixed effects in linear mixed-effect models.In the context of
topFeatures(), the DoF that should be used when calculating the p-value is the denominator degree-of-freedom for a given effect (the nominator DoF should be 1 for most of the cases).Using residual DoF as the approximation of the DDoF can, in some cases (many precursors in a single protein group), lead to a significant overestimation of DDoF (10x or more), and, as a result, to a large overestimation of the effect significance (p-values).
This draft PR adds support for using alternative methods for DDoF calculation:
ddf.methodarg to thetopFeatures()call (defaults toresidualto maintain the current behavior)(via alternative ddf.method values). The parameters package dependency is optional (Suggested): if the user-specified
ddf.methodmethod requires parameters package, but it is not available,topFeatures()will fail.The PR also ensures that the row names of the
topFeatures()match the names of the corresponding models (i.e. protein group IDs) after significance filtering and sorting.It also cleans up a bit the code related to msqrobLmer() and adds support for optional storing the original lmerMod model output, as it is required for the
dof_xxx()calls.The ridge models are not yet supported (the ridge codepath modifies the original model, and I have not yet figured out how to make
dof_xxx()calls work with it).Let me know what you think.