-
Notifications
You must be signed in to change notification settings - Fork 13
Description
This is probably quite complicated to do without harming the performance of the "single-model" training but it could be quite interesting.
Right now we are stacking together all the models but keeping track of the separated component so that we can stop the fit for each of them independently.
This is very efficient until the fits don't stabilise. Then one by one the fit stop (by freezing the weights) until eventually the last one stops and the fit finishes.
However, the models we use are not very heavy. Most (almost all in the tests we did... 3 years ago at this point...) the advantage of GPU running happens after the model is sampled: fktable and loss. We can probably separate them completely such that each model trains completely by itself (meaning we have complete freedom on the model side, and are not restricted to the architecture).