タグ

Attentionに関するRyobotのブックマーク (6)

  • Adaptive Attention Span in Transformers

    Ryobot
    Ryobot 2019/07/11
    ヘッド毎に参照範囲が異なるなら,最初から参照範囲を学習可能にしちゃえという心.参照範囲をパラメータzで制御し,lossにzを入れる.TrmXLに近い性能.MemN2Nと同じ著者.
  • [1810.13409] You May Not Need Attention

    In NMT, how far can we get without attention and without separate encoding and decoding? To answer that question, we introduce a recurrent neural translation model that does not use attention and does not have a separate encoder and decoder. Our eager translation model is low-latency, writing target tokens as soon as it reads the first source token, and uses constant memory during decoding. It per

    Ryobot
    Ryobot 2018/11/09
    脱エンコーダ・デコーダ.脱Attention.ソース単語の入力と同時にターゲット単語を出力,つまり言語モデルっぽい構造.単語アライメントを事前に推定.文長は特殊記号で揃える.better on long sentences.
  • Image Transformer

    Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By

    Ryobot
    Ryobot 2018/02/20
    attention is all you need ではアイデア紹介のみだった制約自己注意をローカル自己注意に改名して PixelCNN 系の画像生成に適応.GAN でやってほしかった.
  • Squeeze-and-Excitation Networks

    The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN

  • Residual Attention Network for Image Classification

    In this work, we propose "Residual Attention Network", a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Attention Network is built by stacking Attention Modules which generate attention-aware features. The attention-aware features from different modules change adaptivel

  • Attentionで拡張されたRecurrent Neural Networks

    Neural Turing Machines ソースコード Attentionインターフェース Adaptive Computation Time コード Neural Programmer ソースコード 総括的な今後の展望 参考 記事はAttention and Augmented Recurrent Neural Networksの著者の許諾を得て翻訳しました。 Recurrent Neural Networksは、文章や音声、動画などの順序を持つデータをニューラルネットワークで扱うことができるディープラーニングの重要な要素のうちの1つです。 RNNを使うことで、一連の順序に現れるパターンを抽象的に理解して、注釈をつけたり、まったくのゼロから一連のデータを生成することすらできるのです! シンプルなRNNの設計では、長期の時系列データには苦戦しますが、「long short-term

    Attentionで拡張されたRecurrent Neural Networks
  • 1