State–action–reward–state–action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was proposed by Rummery and Niranjan in a technical note[1] with the name "Modified Connectionist Q-Learning" (MCQ-L). The alternative name SARSA, proposed by Rich Sutton, was only mentioned as a footnote. This name reflects the fac
![State–action–reward–state–action - Wikipedia](https://cdn-ak-scissors.b.st-hatena.com/image/square/1c654a58bb741aab520f6495ea459aea8836fa18/height=288;version=1;width=512/https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2F0%2F02%2FNeural_network_with_dark_background.png%2F1200px-Neural_network_with_dark_background.png)