タグ

ブックマーク / gist.github.com/yoavg (1)

  • Reinforcement Learning for Language Models

    rl-for-llms.md Reinforcement Learning for Language Models Yoav Goldberg, April 2023. Why RL? With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrat

    Reinforcement Learning for Language Models
    yamadar
    yamadar 2023/04/23
    教師あり学習は言語モデルに嘘をつかせる可能性がある。このためOpenAIはGPTモデルの強化学習型チューニング(RLHF)に多くの努力を投じており、モデルが答えを知らない場合に回答を避けることを促している。
  • 1