[B! HN] mactkgのブックマーク

mactkg id:mactkg

HNに関するmactkgのブックマーク (1)

Learning from human preferences
The overall training process is a 3-step feedback cycle between the human, the agent’s understanding of the goal, and the RL training. Our AI agent starts by acting randomly in the environment. Periodically, two video clips of its behavior are given to a human, and the human decides which of the two clips is closest to fulfilling its goal—in this case, a backflip. The AI gradually builds a model o
mactkg 2017/06/14
HN

AI
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx