教師あり学習は言語モデルに嘘をつかせる可能性がある。このためOpenAIはGPTモデルの強化学習型チューニング（RLHF）に多くの努力を投じており、モデルが答えを知らない場合に回答を避けることを促している。

yamadar のブックマーク 2023/04/23 23:55

<blockquote class="hatena-bookmark-comment"><a class="comment-info" href="https://b.hatena.ne.jp/entry/4735509596903580773/comment/yamadar" data-user-id="yamadar" data-entry-url="https://b.hatena.ne.jp/entry/s/gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81" data-original-href="https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81" data-entry-favicon="https://cdn-ak2.favicon.st-hatena.com/64?url=https%3A%2F%2Fgist.github.com%2Fyoavg%2F6bff0fecd65950898eba1bb321cfbd81" data-user-icon="/users/yamadar/profile.png">Reinforcement Learning for Language Models</a><ul class="comment-tag" style="list-style: none; margin: 0px;"><li style="float: left">[<a href="https://b.hatena.ne.jp/q/AI">AI</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/%40ChatGPT">@ChatGPT</a>]</li><li style="float: left">[<a href="https://b.hatena.ne.jp/q/%40OpenAI">@OpenAI</a>]</li></ul><br><p style="clear: left">教師あり学習は言語モデルに嘘をつかせる可能性がある。このためOpenAIはGPTモデルの強化学習型チューニング（RLHF）に多くの努力を投じており、モデルが答えを知らない場合に回答を避けることを促している。</p><a class="datetime" href="https://b.hatena.ne.jp/yamadar/20230423#bookmark-4735509596903580773"><span class="datetime-body">2023/04/23 23:55</span></a></blockquote><script src="https://b.st-hatena.com/js/comment-widget.js" charset="utf-8" async></script>

このブックマークにはスターがありません。
最初のスターをつけてみよう！

Reinforcement Learning for Language Models

gist.github.com/yoavg2023/04/23

rl-for-llms.md Reinforcement Learning for Language Models Yoav Goldberg, April 2023. Why RL? With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of disc...

19 人がブックマーク・2 件のコメント

他のコメントを読む

＼コメントがサクサク読めるアプリです／

はてなブックマーク

Reinforcement Learning for Language Models

はてなブックマーク

公式Twitter

はてなのサービス