Stochastic Gradient Descent algorithm

Gradient based algorithms are the default training algorithms for ANN. Hence, providing support for such algorithms (SGD, ADAM, RMSProp, etc.) is critical in order to provide out of the box benchmarking capabilities. Our suggestion is to proceed as follows:

- [ ] Create a new class in `base` package (e.g. `class derivable : public solution {}` that will inherit from `solution class`). `derivable` should have an array of floats (e.g. `derivable::m_df`) that represents the derivative of the `solution::fitness()` function respect each parameter in `solution::get_params()`. Hence,  `derivable::get_df()` size will be `solution::size()`.

- [ ] Define a getter in `derivable` class (e.g. `derivable:df()`) that, in case the solution was modified will calculate the derivative of the fitness function and store the result in `derivable::m_df` array. In case the solution is not modified, it will simply return `derivable::m_df`. The implementation of this method should be the same as in the current `solution::fitness()` method.

- [ ] As in the case of  `solution::fitness()`  and  `solution::calculate_fitness()`  consider an implementation of  `derivable:df()` and  a protected `virtual derivable:calculate_df() = 0` method in `derivable`. **It is probably a good idea** to do not create  `derivable::m_df` before the first  `derivable:df()` call just in case the derivative gets never used.

- [ ] Each child of `derivable` in `solutions` package should re-implement its own version of `virtual derivable:calculate_df() = 0`  according to the fitness function (only if the fitness function is derivable of course). This means that the `network` class should inherit from `derivable`instead of `solution` and implement  `virtual derivable::calculate_df() = 0`. 

- [ ]  The `network:calculate_df()` implementation will call a `layer::backprop()` method defined in the layer class, passing the position in the  `derivable::m_df` array where the layer will store the derivative of its corresponding parameters.  `layer::backprop()` method should be similar to the current  `layer::prop()` method.

- [ ] Each child of `layer` in package `layers` should re-implement its own version  of `virtual layer:backprop() = 0`. Currently there should be a single layer `fc`  (fully connected layers) implemented in the library.

- [ ] Create a new class in `algorithms` package `class sgd : public algorithm` that using the derivative  and the fitness function of a `derivable` solution can implement Stochastic Gradient Descent Algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stochastic Gradient Descent algorithm #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stochastic Gradient Descent algorithm #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions