On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

テクノロジーカテゴリーの変更を依頼記事元:

arxiv.org

7 usersがブックマークコメント

記事へのコメント1件

注目コメント
新着コメント

elu_18 汎化については、どの局所解がより汎化するかみたいなのに興味があって、例えばみたいにフラットな解が汎化するのではという話があるけど、こういうのちゃんと調べるにはどういうこと考えたらいいかよくわからない

fromTw

2017/01/17 リンク

注目コメント算出アルゴリズムの一部にLINEヤフー株式会社の「建設的コメント順位付けモデルAPI」を使用しています

規約違反を報告

いまの話題をアプリでチェック！

バナー広告なし
ミュート機能あり
ダークモード搭載

アプリをダウンロード

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many D... The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training data, say $32$-$512$ data points, is sampled to compute an approximation to the gradient. It has been observed in practice that when using a larger batch there is a degradation in the quality of the mod