-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Gradient based algorithms are the default training algorithms for ANN. Hence, providing support for such algorithms (SGD, ADAM, RMSProp, etc.) is critical in order to provide out of the box benchmarking capabilities. Our suggestion is to proceed as follows:
-
Create a new class in
basepackage (e.g.class derivable : public solution {}that will inherit fromsolution class).derivableshould have an array of floats (e.g.derivable::m_df) that represents the derivative of thesolution::fitness()function respect each parameter insolution::get_params(). Hence,derivable::get_df()size will besolution::size(). -
Define a getter in
derivableclass (e.g.derivable:df()) that, in case the solution was modified will calculate the derivative of the fitness function and store the result inderivable::m_dfarray. In case the solution is not modified, it will simply returnderivable::m_df. The implementation of this method should be the same as in the currentsolution::fitness()method. -
As in the case of
solution::fitness()andsolution::calculate_fitness()consider an implementation ofderivable:df()and a protectedvirtual derivable:calculate_df() = 0method inderivable. It is probably a good idea to do not createderivable::m_dfbefore the firstderivable:df()call just in case the derivative gets never used. -
Each child of
derivableinsolutionspackage should re-implement its own version ofvirtual derivable:calculate_df() = 0according to the fitness function (only if the fitness function is derivable of course). This means that thenetworkclass should inherit fromderivableinstead ofsolutionand implementvirtual derivable::calculate_df() = 0. -
The
network:calculate_df()implementation will call alayer::backprop()method defined in the layer class, passing the position in thederivable::m_dfarray where the layer will store the derivative of its corresponding parameters.layer::backprop()method should be similar to the currentlayer::prop()method. -
Each child of
layerin packagelayersshould re-implement its own version ofvirtual layer:backprop() = 0. Currently there should be a single layerfc(fully connected layers) implemented in the library. -
Create a new class in
algorithmspackageclass sgd : public algorithmthat using the derivative and the fitness function of aderivablesolution can implement Stochastic Gradient Descent Algorithm.