Table of Contents Table of Contents $k$-armed Bandit Problem Action-value Methods Estimating Action Values Action Selection Rule : Greedy Action Selection Rule : $\epsilon$-Greedy Crux: Nonstationary Action Value 1. Transience 2. Convergence Action Selection Rule: Optimistic Initial Values Action Selection Rule: Upper-Confidence-Bound Selection Gradient Bandit Algorithms Ending Remarks To begin, w