Source code to the paper https://arxiv.org/abs/1606.01467 My current thoughts on hyperparameter optimization can be found in my blog post . Abstract and motivations Let's take a look at the video of the training of (lower-layer) deep newtorks' weights: http://cs.nyu.edu/~yann/research/sparse/psd-anim.gif Actually the video is about sparse coding, but they are similar to the training at lower-laye