rl-for-llms.md Reinforcement Learning for Language Models Yoav Goldberg, April 2023. Why RL? With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrat
![Reinforcement Learning for Language Models](https://cdn-ak-scissors.b.st-hatena.com/image/square/1ef26f6cb4349557952890dbe3e567f7f98dc151/height=288;version=1;width=512/https%3A%2F%2Fgithub.githubassets.com%2Fassets%2Fgist-og-image-54fd7dc0713e.png)