BarnettZehnwirth fails if fiting on C(valuation) as valuation has a different dimension from origin and devlopment #585

cf4869 · 2022-04-07T01:56:30Z

cf4869
Apr 7, 2022

abc = cl.load_sample('abc')
len(abc.origin)
len(abc.development)
len(abc.valuation)
model = cl.BarnettZehnwirth(formula='C(origin)+C(development)+C(valuation)').fit(abc)

jbogaardt · 2022-04-07T02:09:12Z

jbogaardt
Apr 7, 2022
Maintainer

Hi @cf4869 , this is related to #230 and #231.

When using discrete valuations C(valuation), the regression doesn't quite know how to handle valuation periods beyond those present in the triangle. Because there is no data for future diagonals, there are no coefficients, and the estimator bugs out when it tries to predict out the lower half of the triangle. This is why using valuation as a continuous/ordinal feature works:

model = cl.BarnettZehnwirth(formula='C(origin)+C(development)+valuation').fit(abc)

Here the regression knows to extrapolate beyond the end of the triangle since it sees valuation as an ordinal variable.

I would like to see BarnettZehnwirth support a future trend assumption as a user-supplied parameter so that your example works. We haven't quite figured out a clean/flexible way to expand this estimator to that yet., but its on the list of todos.

0 replies

cf4869 · 2022-04-07T17:22:04Z

cf4869
Apr 7, 2022
Author

Hi @jbogaardt, thanks for the answer. so in that case, the signal should be picked up by using valuation as one of the parameters, but we still have non-random residual on the evaluation date graph.
model = cl.BarnettZehnwirth(formula='C(origin)+C(development)+ valuation').fit(abc)

if we fit only ordinal variables, we have non-random residuals on all of them. Does this makes sense?
model = cl.BarnettZehnwirth(formula='origin+development+valuation').fit(abc)

0 replies

jbogaardt · 2022-04-07T18:18:13Z

jbogaardt
Apr 7, 2022
Maintainer

I'll have to dust off the paper - its been a while and my understanding is a little hazy.

Fitting features as ordinal/continuous and not strictly categorical, you get a single regression coefficient for that axis.

For example, this model has three coefficents (plus intercept):

import chainladder as cl
abc = cl.load_sample('abc')
cl.BarnettZehnwirth(formula='origin+development+valuation').fit(abc).coef_

A single origin coefficient as in this model assumes linearity in the trend along the origin dimension. But if there is non-linearity in the underlying data, you would see non-random residuals. This suggests that you should probably break origin up into separate coefficients.

Hypothetically you could choose one coefficient for the 3 oldest origin years, another coefficient for origins 4 and 5 and a final coefficient for origin years 6 and later. This is how this would look:

cl.BarnettZehnwirth(
   formula='C(np.where(origin<=2, 0, np.where(origin<5,1,2)))+development+valuation'
).fit(abc).coef_

Actually, getting back to your original issue, you could create a model that uses discrete valuations and just extrapolates future valuations from the last available:

cl.BarnettZehnwirth(
   formula='C(origin)+C(development)+C(np.minimum(valuation, 9))'
).fit(abc).coef_

The point I am trying to make in all this is that the residual analysis gives you insight into how you should structure your formula, its not guaranteed to be random for any particular formula.

0 replies

cf4869 · 2022-04-11T21:14:37Z

cf4869
Apr 11, 2022
Author

Hi @jbogaardt, thanks for clarifying; it appears that using discrete valuations completely absorbs the signal, resulting in random residuals in all directions.
model = cl.BarnettZehnwirth(formula='C(origin)+C(development)+C(np.minimum(valuation, 9))').fit(abc)

However, because of the multicollinearity, the model may be overparameterized in this case. As a result, fitting on fewer parameters by combining a few levels could be a viable option. So, aside from grouping levels, is it possible to set a few parameters in the fitted model? For example, if we know the trend is 2% prior to 1979, and 5% after for origin, how do we feed this information into the model so that we only need to fit development and valuation?

0 replies

jbogaardt · 2022-04-11T21:44:57Z

jbogaardt
Apr 11, 2022
Maintainer

Do you mean to insert offsets for specific parameters rather than fitting the parameters from the data? Unfortunately, no.

Under the hood, the regression is being carried out by sklearn.linearmodel.Linear_Regression which doesn't support offset parameters. The statsmodels.GLM implementation does support offsets, but chainladder-python is not currently built on it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BarnettZehnwirth fails if fiting on C(valuation) as valuation has a different dimension from origin and devlopment #585

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

BarnettZehnwirth fails if fiting on C(valuation) as valuation has a different dimension from origin and devlopment #585

Uh oh!

cf4869 Apr 7, 2022

Replies: 5 comments

Uh oh!

Uh oh!

jbogaardt Apr 7, 2022 Maintainer

Uh oh!

cf4869 Apr 7, 2022 Author

Uh oh!

Uh oh!

jbogaardt Apr 7, 2022 Maintainer

Uh oh!

cf4869 Apr 11, 2022 Author

Uh oh!

jbogaardt Apr 11, 2022 Maintainer

cf4869
Apr 7, 2022

jbogaardt
Apr 7, 2022
Maintainer

cf4869
Apr 7, 2022
Author

jbogaardt
Apr 7, 2022
Maintainer

cf4869
Apr 11, 2022
Author

jbogaardt
Apr 11, 2022
Maintainer