This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it
![Pythonによる4足歩行ロボットの制御と強化学習による歩行動作獲得の実例 #pyconjp](https://cdn-ak-scissors.b.st-hatena.com/image/square/7f97cfa75d93459168cbd2423daa3e2718caea1b/height=288;version=1;width=512/https%3A%2F%2Fcdn.slidesharecdn.com%2Fss_thumbnails%2Fpyconjp2015robot-151107084024-lva1-app6892-thumbnail.jpg%3Fwidth%3D640%26height%3D640%26fit%3Dbounds)