Skip to content

Inconsistency in split_group_term() for add_lower_terms = FALSE #533

@fweber144

Description

@fweber144

There is an inconsistency in how split_group_term() behaves in case of add_lower_terms = FALSE (before #532, add_lower_terms was called add_main_effects, but I'll refer to the state of #532 here). This inconsistency occurs between a group-level term that has a group-level intercept and a group-level term that does not have a group-level intercept. This is relevant only for project() because only there, we have add_lower_terms = FALSE.

Reprex:

devtools::load_all("<path_to_projpred>")
#> ℹ Loading projpred
#> This is projpred version 2.9.0.9000.
split_formula(y ~ (x + z + x:z | g), add_lower_terms = FALSE)
#> [1] "1"         "(1 | g)"   "(x | g)"   "(z | g)"   "(x:z | g)"
split_formula(y ~ (0 + x + z + x:z | g), add_lower_terms = FALSE)
#> [1] "1"                   "x + (0 + x | g)"     "z + (0 + z | g)"
#> [4] "x:z + (0 + x:z | g)"

Created on 2025-08-23 with reprex v2.1.1

I would have expected the same output for the first split_formula() call, but different output for the second split_formula() call (namely, without the population-level terms x, z, and x:z added):

devtools::load_all("<path_to_projpred>")
#> ℹ Loading projpred
#> This is projpred version 2.9.0.9000.
split_formula(y ~ (x + z + x:z | g), add_lower_terms = FALSE)
#> [1] "1"         "(1 | g)"   "(x | g)"   "(z | g)"   "(x:z | g)"
split_formula(y ~ (0 + x + z + x:z | g), add_lower_terms = FALSE)
#> [1] "1"                   "(0 + x | g)"     "(0 + z | g)"
#> [4] "(0 + x:z | g)"

The reason I expect it this way (and not in the way that the first split_formula() call should add the population-level terms x, z, and x:z) is the way split_group_terms() behaves in case of group_intercept == TRUE (object group_intercept is created inside of split_group_terms()):

devtools::load_all("<path_to_projpred>")
#> ℹ Loading projpred
#> This is projpred version 2.9.0.9000.
split_formula(y ~ (x + z + x:z | g))
#> [1] "1"                               "(1 | g)"                        
#> [3] "x + (x | g)"                     "z + (z | g)"                    
#> [5] "x + z + x:z + (x + z + x:z | g)" "x + (1 | g)"                    
#> [7] "z + (1 | g)"                     "x + z + x:z + (1 | g)"
split_formula(y ~ (x + z + x:z | g), add_lower_terms = FALSE)
#> [1] "1"         "(1 | g)"   "(x | g)"   "(z | g)"   "(x:z | g)"

Created on 2025-08-23 with reprex v2.1.1

In lines https://github.com/fweber144/projpred/blob/c11bac4145ea4700bf54e774463834f6f90ed76d/R/formula.R#L430-L492, one can see the different add_lower_terms cases in the group_intercept == TRUE case. Different add_lower_terms cases are missing in the group_intercept == FALSE case (lines https://github.com/fweber144/projpred/blob/c11bac4145ea4700bf54e774463834f6f90ed76d/R/formula.R#L493-L508), so I think it was simply forgotten to have different add_lower_terms cases there. I'll add a PR with a fix how I would imagine it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions