In today’s post we will compare five popular optimization techniques: SGD, SGD+momentum, Adagrad, Adadelta, and Adam – methods for finding local optimum (global when dealing with convex problems) of certain differentiable functions. In the experiments conducted later in this post, these functions will all be error functions of feed-forward neural networks of various architectures for the problem o
