National Institute of Information and Communications Technology National Institute for Materials Science Institut NEEL, CNRS and Université Grenoble Alpes Decision making in uncertain, dynamically changing environments is fundamentally difficult since the exploration for the unknown best and the exploitation of the known best is in a trade-off. This problem has been known as “multi-armed bandit pr