素晴らしい研究．パラメータ数が重要であり幅や深さは重要じゃない．lossが各変数のべき法則に従ってることを表すプロットに感心した． “The loss scales as a power-law with model size, dataset size, and the amount of compute used for training”

Ryobot のブックマーク 2020/04/24 19:30

<blockquote class="hatena-bookmark-comment"><a class="comment-info" href="https://b.hatena.ne.jp/entry/4684763716960067810/comment/Ryobot" data-user-id="Ryobot" data-entry-url="https://b.hatena.ne.jp/entry/s/arxiv.org/abs/2001.08361" data-original-href="https://arxiv.org/abs/2001.08361" data-entry-favicon="https://cdn-ak2.favicon.st-hatena.com/64?url=https%3A%2F%2Farxiv.org%2Fabs%2F2001.08361" data-user-icon="/users/Ryobot/profile.png">Scaling Laws for Neural Language Models</a><ul class="comment-tag" style="list-style: none; margin: 0px;"><li style="float: left">[<a href="https://b.hatena.ne.jp/q/GPT-2">GPT-2</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/%22Scaling%20Law%22">Scaling Law</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/Transformer">Transformer</a>]</li></ul><br><p style="clear: left">素晴らしい研究．パラメータ数が重要であり幅や深さは重要じゃない．lossが各変数のべき法則に従ってることを表すプロットに感心した． “The loss scales as a power-law with model size, dataset size, and the amount of compute used for training”</p><a class="datetime" href="https://b.hatena.ne.jp/Ryobot/20200424#bookmark-4684763716960067810"><span class="datetime-body">2020/04/24 19:30</span></a></blockquote><script src="https://b.st-hatena.com/js/comment-widget.js" charset="utf-8" async></script>

このブックマークにはスターがありません。
最初のスターをつけてみよう！

Scaling Laws for Neural Language Models

arxiv.org2020/04/24

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, ...

7 人がブックマーク・1 件のコメント

他のコメントを読む

＼コメントがサクサク読めるアプリです／

はてなブックマーク

Scaling Laws for Neural Language Models

はてなブックマーク

公式Twitter

はてなのサービス