Assume k in [1..N]. 1) Train N models in parallel 2) use from 1 up to N of those trained models to do the predictions --> we only train N models in total. Currently, we train 1+2+3+...+N models!