An overview of gradient descent optimization algorithms Gradient descent is the preferred way to optimize neural networks and many other machine learning algorithms but is often used as a black box. This post explores how many of the most popular gradient-based optimization algorithms such as Momentum, Adagrad, and Adam actually work. This post explores how many of the most popular gradient-based
![An overview of gradient descent optimization algorithms](https://cdn-ak-scissors.b.st-hatena.com/image/square/4195405efaa874f2fdb7e09a1328e448784ab972/height=288;version=1;width=512/https%3A%2F%2Fwww.ruder.io%2Fcontent%2Fimages%2F2016%2F09%2Floss_function_image_tumblr.png)