-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Ties in lx cause problems for our quantile functions, all of which use monotonic splines. Example
age = 0:10
lx = c(10,9,8,7,7,5,4,3,2,1,0)/10
splinefun(age~lx, method = "monoH.FC")(.5)
# gives warning:
[1] 5
Warning message:
In regularize.values(x, y, ties, missing(ties)) :
collapsing to unique 'x' values
In other applications that would come up using HMD lx values, warnings will also ensue.
I looked into how other quantile-stuff implementations handle these situations. They use dithering. Here's an lx-specific dither routine that I propose, based on a subset of code found in quantreg::dither(), given that for lx we need to insist on monotonicity. In essence, IFF there are ties, then we generate random uniform deviates between 0 and 10000th of the radix, sort them, add them to lx, then ensure lx has the original radix.
maybe_dither_lx <- function(lx, value = .0001){
if(any(duplicated(lx))){
N <- length(lx)
radix <- lx[1]
# random values between 0 and 1000th of radix
dith <- runif(N, 0, radix * value)
# sort to enforce monotonicity
dith <- sort(dith, decreasing = TRUE)
# apply
lx <- lx + dith
#rescale to same radix
lx <- lx * radix / lx[1]
}
lx
}
lx
}
In case there are no ties, then the same lx is spit back, otherwise we do this little perturbation to eliminate ties without compromising the shape.
For the above case, we get a reasonable result and no warning:
splinefun(age~maybe_dither_lx(lx), method = "monoH.FC")(.5)
[1] 4.999998
For the more frequent case of consecutive 0s in the highest ages of lx, the behavior is like so (USA bltper_1x1 1936). As is, we get errors, with dithering we don't.
lx = c(100000L, 94002L, 93175L, 92781L, 92517L, 92307L, 92133L, 91976L,
91834L, 91704L, 91583L, 91467L, 91353L, 91233L, 91103L, 90956L,
90788L, 90596L, 90381L, 90145L, 89891L, 89621L, 89337L, 89041L,
88735L, 88420L, 88098L, 87769L, 87434L, 87094L, 86749L, 86402L,
86046L, 85678L, 85293L, 84883L, 84446L, 83980L, 83486L, 82966L,
82425L, 81867L, 81291L, 80695L, 80074L, 79422L, 78730L, 77994L,
77210L, 76375L, 75486L, 74541L, 73538L, 72480L, 71369L, 70212L,
69013L, 67767L, 66468L, 65106L, 63671L, 62153L, 60547L, 58849L,
57054L, 55160L, 53162L, 51072L, 48905L, 46673L, 44387L, 42055L,
39667L, 37207L, 34669L, 32056L, 29476L, 26875L, 24282L, 21735L,
19275L, 17001L, 14842L, 12805L, 10902L, 9149L, 7561L, 6149L,
4912L, 3851L, 2962L, 2241L, 1671L, 1233L, 903L, 658L, 458L, 311L,
207L, 135L, 85L, 53L, 32L, 19L, 11L, 6L, 3L, 2L, 1L, 0L, 0L)
ineq_quantile(0:110,lx)
Error in splinefun(age ~ lx, method = "monoH.FC") : zero non-NA points
In addition: There were 50 or more warnings (use warnings() to see the first 50)
# succeeds
ineq_quantile(0:110,maybe_dither_lx(lx))
In this case, for all but the lowest ages, residuals are negative, observe:
# truncate for default behavior without errors:
q1 <- ineq_quantile(0:108,lx[1:109])
# apply dithering the whole thing
q2 <- ineq_quantile(0:110,maybe_dither_lx(lx,.0001))
resid <- q1 - q2[1:109]
# y scale in weeks
plot(0:108,resid*52)
So, trivially negative until the tail. One can simply set a default to a lower value:
q3 <- ineq_quantile(0:110,maybe_dither_lx(lx,.00001))
resid2 <- q1 - q3[1:109]
# y scale in weeks
plot(0:108,resid*52)
lines(0:108,resid2*52)
This reduces most age residuals, but then squeezes larger errors to a spike at the "end". No matter what value we choose, warnings and errors are eliminated, but we'd of course like to reduce the consequence. Does something like this seem reasonable to you, like to just do it automatically to remove user headaches? That final "residual" is in the second case is a mere 17.68 weeks of age, or .345 years. Any thoughts on a solution in this direction?

