This document presents mathematical formulas for calculating gradients and updates in reinforcement learning. It defines a formula for calculating the gradient of a value function with respect to its parameters, a formula for calculating the gradient of a policy based on the reward and value, and a formula for calculating the gradient of a parameter vector that is a weighted combination of its pre
![A3Cという強化学習アルゴリズムで遊んでみた話](https://cdn-ak-scissors.b.st-hatena.com/image/square/8af2302706ff76d984f91ef44664d2177fd99811/height=288;version=1;width=512/https%3A%2F%2Fcdn.slidesharecdn.com%2Fss_thumbnails%2Fpfiseminar20160519-160519054901-thumbnail.jpg%3Fwidth%3D640%26height%3D640%26fit%3Dbounds)