2017年11月16日のブックマーク (2件)

  • Decoupled Weight Decay Regularization

    L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L$_2$ regularization (often calling it "weight decay" in what may be misleading due to the inequiva

    elu_18
    elu_18 2017/11/16
    AdamがSGDと比べて汎化性能が低い原因は、多くの実装でWeight Decay分も(意図せず)正規化され効果が弱まってしまうため。補正すると同様の汎化性能が得られる。 またWeight Decayをバッチサイズ、データ数、エポック数で正規
  • Software 2.0

    I sometimes see people refer to neural networks as just “another tool in your machine learning toolbox”. They have some pros and cons, they work here or there, and sometimes you can use them to win Kaggle competitions. Unfortunately, this interpretation completely misses the forest for the trees. Neural networks are not just another classifier, they represent the beginning of a fundamental shift i

    Software 2.0
    elu_18
    elu_18 2017/11/16
    “Software 2.0” 私の考えとよく似ている。これから深層学習や強化学習はソフトウェア開発の重要な構成要素となってくる。自動化の波でソフトウェア開発が最初に自動化されていく。今後のHWの高速化には並列化/非同期化