自己教あり学習 (Self-Supervised Learning) に関する資料 2023年1月24日作成 岡本直樹(中部大学・機械知覚&ロボティクス研究グループ)
The first high-performance self-supervised algorithm that works for speech, vision, and text Self-supervised learning — where machines learn by directly observing the environment rather than being explicitly taught through labeled images, text, audio, and other data sources — has powered many significant recent advances in AI. But while people appear to learn in a similar way regardless of how the
Self-supervised learning, dubbed the dark matter of intelligence, is a promising path to advance machine learning. Yet, much like cooking, training SSL methods is a delicate art with a high barrier to entry. While many components are familiar, successfully training a SSL method involves a dizzying set of choices from the pretext tasks to training hyper-parameters. Our goal is to lower the barrier
In recent years, the AI field has made tremendous progress in developing AI systems that can learn from massive amounts of carefully labeled data. This paradigm of supervised learning has a proven track record for training specialist models that perform extremely well on the task they were trained to do. Unfortunately, there’s a limit to how far the field of AI can go with supervised learning alon
The primary aim of single-image super-resolution is to construct a high-resolution (HR) image from a corresponding low-resolution (LR) input. In previous approaches, which have generally been supervised, the training objective typically measures a pixel-wise average distance between the super-resolved (SR) and HR images. Optimizing such metrics often leads to blurring, especially in high variance
4. 4 ■ 今回の特徴表現の良さ=discriminative - あらゆる解きたいタスク (target task) に有効なデータの特徴表現 - (擬似的なタスク (pretext task) を事前に解くことで獲得) - disentangleなど,他の良さについては問わない ■ Self-Supervised Learning (SSL; 自己教師学習) - 自動で生成できる教師信号を用いてpretext taskを定義 - 画像,動画,音声,マルチモーダル(本資料のメインフォーカス) ■ SSL以外 (Unsupervised) - データ分布を表現するモデルを学習する (教師はない) 自己教師学習とは? 教師がないデータに対し自ら教師を作成, その問題において良好な特徴表現を獲得する CNNと画像・動画・音声やそれらの統合を自己教師として用いる 5. 5 ■ 主に2ステップ:
本記事ではFacebook AI Researchの研究者らによって提案されたDINOという,画像モデルにおける自己教師あり学習の解説を行います. Caron, Mathilde, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. “Emerging Properties in Self-Supervised Vision Transformers.” arXiv [cs.CV]. http://arxiv.org/abs/2104.14294. (cf.) Facebook ブログ, GitHub, Yannic Kilcher氏の解説動画 要点:画像モデル (e.g. ResNet, Vision transformers)における,ラベル無
Data2vec 2.0: Highly efficient self-supervised learning for vision, speech and text Many recent breakthroughs in AI have been powered by self-supervised learning, which enables machines to learn without relying on labeled data. But current algorithms have several significant limitations, often including being specialized for a single modality (such as images or text) and requiring lots of computat
自己教師あり学習(Self-Supervised Learning:SSL)とは?:AI・機械学習の用語辞典 用語「自己教師あり学習」について説明。ラベルなしの大量データセットを使って、プレテキストタスク(疑似的なラベルが自動生成された代替のタスク)を解くための事前学習を行う学習方法のこと。その後、ターゲットタスクを解くために、(少量の)別のデータセットを使って事前学習済みモデルをファインチューニングする。 連載目次 用語解説 機械学習(厳密にはニューラルネットワーク)における自己教師あり学習(Self-Supervised Learning:SSL)とは、教師ラベルが付与されていない大量のデータセットを使って、プレテキストタスク(Pretext task)と呼ばれる「疑似的なラベルが自動生成された代替のタスク」を解くための事前学習を行う学習方法のことである。その後、本来の目的であるターゲ
An overview of self-supervised pretext tasks in Natural Language Processing While Computer Vision is making amazing progress on self-supervised learning only in the last few years, self-supervised learning has been a first-class citizen in NLP research for quite a while. Language Models have existed since the 90’s even before the phrase “self-supervised learning” was termed. The Word2Vec paper fro
Image credit: Depositphotos This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI. Despite the huge contributions of deep learning to the field of artificial intelligence, there’s something very wrong with it: It requires huge amounts of data. This is one thing that both the pioneers and critics of deep learning agree on. In fact,
Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound units have variable lengths with no explicit segmentation. To deal with these three problems, we propose the Hidden-Unit BERT (HuBERT) approach for
Predictorでペア画像に対するEncoderの出力を予測することになるが、学習が進めば平均的な出力を予想することになり、結果Encoderの出力も平均的な出力に近づいているので、結果的にBackboneも平均的(一般的)な特徴を学習したことになる。というような流れだと筆者は理解している。 図中のstop-gradは勾配計算を止めることで、これで"Collapsing Solutions"を防ぐらしい。 素人考えではProjectorの層を無くしてBackboneとPredictorを直結した方が早いんじゃないかと思うのだが、Projectorを入れてLoss計算用の空間に一旦投影する。この辺は先行研究であるSimCLRの論文で議論されているようだ。 実装 公式実装がありPyTorch派の方はこちらを使えばそれで終わりのように思うが、筆者はtf.keras派なので自前で実装しなければな
A visual introduction to self-supervised learning methods for visual representations. I first got introduced to self-supervised learning in a talk by Yann Lecun, where he introduced the “cake analogy” to illustrate the importance of self-supervised learning. In the talk, he said: “If intelligence is a cake, the bulk of the cake is self-supervised learning, the icing on the cake is supervised learn
Data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language 概要While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning