This repository explores an interesting behavior during neural network training from Keller Jordan Mean behavior is differentiable in the learning rate. Neural network training is an unpredictable, unstructured process that constrasts structured engineering work. Two models trained with the same hyperparameters and dataset can create different model behaviors. The goal is to turn deep learning into a more structured approach.
The behavior Keller found that I try reproducing is the learning rate behavior: Learning rate has a locally linear effect on average neural network behavior across repeated runs of training.
The paper Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD from Arseniy Andreyev and Pierfrancesco Beneventano attempts to explain this phenomenon using batch sharpness and lambda max measurements.
pip install -r requirements.txt
sbatch run_experiment.sbatch