"reinforcement learning"の人気記事 28件 - はてなブックマーク

1 - 28 件 / 28件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

"reinforcement learning"の検索結果1 - 28 件 / 28件

タグ検索の該当結果が少ないため、タイトル検索結果を表示しています。

"reinforcement learning"に関するエントリは28件あります。機械学習、 AI、 research などが関連タグです。人気エントリには『Offline Reinforcement Learning』などがあります。

Offline Reinforcement Learning
- 25 users
- speakerdeck.com/aiueola
- テクノロジー
- 2021/03/24
チュートリアル @ 強化学習若手の会 https://young-reinforcement.github.io/ 解説記事（Qiita） https://qiita.com/aiueola/items/90f635200d808f904daf
Faster sorting algorithms discovered using deep reinforcement learning - Nature
- 22 users
- www.nature.com
- 学び
- 2023/06/08
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Reinforcement Learning for Language Models
- 19 users
- gist.github.com/yoavg
- テクノロジー
- 2023/04/23
rl-for-llms.md Reinforcement Learning for Language Models Yoav Goldberg, April 2023. Why RL? With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrat
- AI
- あとで読む
Discovering faster matrix multiplication algorithms with reinforcement learning - Nature
- 16 users
- www.nature.com
- テクノロジー
- 2022/10/06
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- 機械学習

GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
- 15 users
- github.com/lucidrains
- テクノロジー
- 2022/12/26
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Illustrating Reinforcement Learning from Human Feedback (RLHF)
- 11 users
- huggingface.co
- テクノロジー
- 2022/12/11
Illustrating Reinforcement Learning from Human Feedback (RLHF) This article has been translated to Chinese 简体中文 and Vietnamese đọc tiếng việt. Language models have shown impressive capabilities in the past few years by generating diverse and compelling text from human input prompts. However, what makes a "good" text is inherently hard to define as it is subjective and context dependent. There are
- research
- あとで読む
Pwnagotchi - Deep Reinforcement Learning instrumenting bettercap for WiFi pwning.
- 8 users
- pwnagotchi.ai
- テクノロジー
- 2019/10/17
navigation Pwnagotchi: Deep Reinforcement Learning for WiFi pwning! Pwnagotchi is an A2C-based “AI” powered by bettercap and running on a Raspberry Pi Zero W that learns from its surrounding WiFi environment in order to maximize the crackable WPA key material it captures (either through passive sniffing or by performing deauthentication and association attacks). This material is collected on disk
ML Platform Meetup: Infra for Contextual Bandits and Reinforcement Learning
- 7 users
- netflixtechblog.com
- テクノロジー
- 2019/10/19
Infrastructure for Contextual Bandits and Reinforcement Learning — theme of the ML Platform meetup hosted at Netflix, Los Gatos on Sep 12, 2019. Contextual and Multi-armed Bandits enable faster and adaptive alternatives to traditional A/B Testing. They enable rapid learning and better decision-making for product rollouts. Broadly speaking, these approaches can be seen as a stepping stone to full-o
- facebook
- あとで読む
Learn Intro to Game AI and Reinforcement Learning Tutorials | Kaggle
- 6 users
- www.kaggle.com
- テクノロジー
- 2020/06/20
Build your own video game bots, using classic and cutting-edge algorithms.
- AI
RLHF（Reinforcement Learning from Human Feedback：人間のフィードバックからの強化学習）とは？
- 6 users
- atmarkit.itmedia.co.jp
- テクノロジー
- 2023/06/07
RLHF（Reinforcement Learning from Human Feedback：人間のフィードバックからの強化学習）とは？：AI・機械学習の用語辞典用語「RLHF」について説明。人間のフィードバックを使ってAIモデルを強化学習する手法を指す。OpenAIのChatGPT／InstructGPTでは、人間の価値基準に沿うように、言語モデルをRLHFでファインチューニング（微調整）している。連載目次用語解説 RLHF（Reinforcement Learning from Human Feedback）とは、「人間のフィードバックからの強化学習」という名前の通り、人間の価値基準に沿うように、人間のフィードバックを使ってAI（言語）モデルを強化学習で微調整（ファインチューニング）する手法である。なお強化学習とは、フィードバック（報酬や罰）に基づいて学習する方法のことだ。 R
- 人工知能
- AI
Reflexion: Language Agents with Verbal Reinforcement Learning
- 5 users
- arxiv.org
- テクノロジー
- 2023/04/02
Large language models (LLMs) have been increasingly used to interact with external environments (e.g., games, compilers, APIs) as goal-driven agents. However, it remains challenging for these language agents to quickly and efficiently learn from trial-and-error as traditional reinforcement learning methods require extensive training samples and expensive model fine-tuning. We propose Reflexion, a
Training a reinforcement learning Agent with Unity and Amazon SageMaker RL | Amazon Web Services
- 5 users
- aws.amazon.com
- テクノロジー
- 2020/12/17
AWS Machine Learning Blog Training a reinforcement learning Agent with Unity and Amazon SageMaker RL Unity is one of the most popular game engines that has been adopted not only for video game development but also by industries such as film and automotive. Unity offers tools to create virtual simulated environments with customizable physics, landscapes, and characters. The Unity Machine Learning A
- AWS
Welcome to the 🤗 Deep Reinforcement Learning Course - Hugging Face Deep RL Course
- 5 users
- huggingface.co
- テクノロジー
- 2023/01/04
Unit 1. Introduction to Deep Reinforcement Learning
- reinforcement learning
- tutorial
Chip Design with Deep Reinforcement Learning
- 4 users
- ai.googleblog.com
- 学び
- 2020/04/24
Philosophy We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Learn more about our Philosophy Learn more
GitHub - google-deepmind/open_spiel: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
- 4 users
- github.com/google-deepmind
- テクノロジー
- 2019/08/28
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partia
- 機械学習
- game
GitHub - CarperAI/trlx: A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
- 4 users
- github.com/CarperAI
- テクノロジー
- 2022/10/18
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Reinforcement Learning Inside Business
- 4 users
- www.slideshare.net/slideshow
- テクノロジー
- 2019/08/31
1. Copyright © TIS Inc. All rights reserved. Reinforcement Learning Inside Business 戦略技術センター久保隆宏強化学習のビジネス適用に挑戦する現場より 2. Copyright © TIS Inc. All rights reserved. 2 ◼ はじめに ◼ 強化学習活用の現場より ◼ モビリティ ◼ ゲーム ◼ 広告配信 ◼ おわりに Reinforcement Learning inside Business 3. Copyright © TIS Inc. All rights reserved. 3 久保隆宏 TIS株式会社戦略技術センター ◼ 化学系メーカーの業務コンサルタント出身。 ◼ 既存の技術では業務改善を行える範囲に限界があるとの実感から、戦略技術センターへ異動。 ◼ 現在は会計/
- 機械学習
Reinforcement learning is supervised learning on optimized data
- 4 users
- bair.berkeley.edu
- テクノロジー
- 2020/10/14
Finding good data and a good policy correspond to optimizing the lower bound, $F(\theta, q)$, with respect to the policy parameters and the experience. One common approach for maximizing the lower bound is to perform coordinate ascent on its arguments, alternating between optimizing the data distribution and the policy.1 Optimizing the Policy When optimizing the lower bound with respect to the pol
Distributional Reinforcement Learning
- 4 users
- www.distributional-rl.org
- 学び
- 2021/12/17
The MIT Press Marc G. Bellemare and Will Dabney and Mark Rowland This textbook aims to provide an introduction to the developing field of distributional reinforcement learning. The book is available at The MIT Press website (including an open access version). The version provided below is a draft. The draft is licensed under a Creative Commons license, see terms and conditions for details. Table o
GitHub - hanjuku-kaso/awesome-offline-rl: An index of algorithms for offline reinforcement learning (offline-rl)
- 4 users
- github.com/hanjuku-kaso
- テクノロジー
- 2021/02/26
Value-Aided Conditional Supervised Learning for Offline RL Jeonghye Kim, Suyoung Lee, Woojun Kim, and Youngchul Sung. arXiv, 2024. Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning Lanqing Li, Hai Zhang, Xinyu Zhang, Shatong Zhu, Junqiao Zhao, and Pheng-Ann Heng. arXiv, 2024. DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Traj
Best Free Resources to Learn Reinforcement Learning in 2023
- 3 users
- medium.com
- 世の中
- 2022/12/27
Several days ago, AlphaTensor was introduced by DeepMind on Nature. I think this is the third time that DeepMind’s Reinforcement Learning(RL) research hits Nature(AlphaGO, AlphaFold and AlphaTensor now). Although RL is powerful, it is more difficult to jump in because there are fewer resources or systematical resources on this topic. I guess the situation is better now, but this is how I felt when
- research
GitHub - evilsocket/pwnagotchi: (⌐■_■) - Deep Reinforcement Learning instrumenting bettercap for WiFi pwning.
- 3 users
- github.com/evilsocket
- テクノロジー
- 2019/10/07
Pwnagotchi is an A2C-based "AI" leveraging bettercap that learns from its surrounding WiFi environment to maximize the crackable WPA key material it captures (either passively, or by performing authentication and association attacks). This material is collected as PCAP files containing any form of handshake supported by hashcat, including PMKIDs, full and half WPA handshakes. Instead of merely pla
GitHub - Farama-Foundation/Gymnasium: An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
- 3 users
- github.com/Farama-Foundation
- テクノロジー
- 2023/02/14
Gymnasium is an open source Python library for developing and comparing reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API. This is a fork of OpenAI's Gym library by its maintainers (OpenAI handed over maintenance a few years ago to an outside team), and is wher
- 機械学習
- Python
GitHub - eleurent/phd-bibliography: References on Optimal Control, Reinforcement Learning and Motion Planning
- 3 users
- github.com/eleurent
- テクノロジー
- 2020/10/08
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- 機械学習
In-context Reinforcement Learning with Algorithm Distillation
- 3 users
- arxiv.org
- 学び
- 2022/10/26
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transf
How to implement a Reinforcement Learning library from Scratch — A Deep dive into Reinforce.jl
- 3 users
- marksaroufim.medium.com
- テクノロジー
- 2019/10/30
The goal of this tutorial is to introduce you to Reinforce.jl library which is a Reinforcement Learning library written in Julia by Tom Breloff This is a library written mostly written by a single person and my theory is that Julia is what helps someone smart like Tom be this productive. So we’re gonna be doing some GitHub archaeology and try to figure out how everything in Reinforce.jl fits toget
GitHub - pfnet/pfrl: PFRL: a PyTorch-based deep reinforcement learning library
- 3 users
- github.com/pfnet
- テクノロジー
- 2020/07/29
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning
- 3 users
- deepmind.google
- テクノロジー
- 2019/11/01
Research AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning Published 30 October 2019 Authors The AlphaStar team TL;DR: AlphaStar is the first AI to reach the top league of a widely popular esport without any game restrictions. This January, a preliminary version of AlphaStar challenged two of the world's top players in StarCraft II, one of the most enduring and
- ai
- google
- game

新着記事

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx