Hazy Research[B!]新着記事・評価 - はてなブックマーク

『Hazy Research』

Zoology (Blogpost 2): Simple, Input-Dependent, and Sub-Quadratic Sequence Mixers
4 users
hazyresearch.stanford.edu

Table 1: Perplexity of 355 million parameter models trained for 10 billion tokens on the Pile. Yet, some subquadratic gated-convolutions match attention on the non AR slice! Can we capture the strengths of both gated convolutions and attention in one purely sub-quadratic architecture? We find the AR gap is because gated convolution models (e.g. Hyena, H3, RWKV, RetNet) need model dimension that sc
- テクノロジー
- 2023/12/15 13:31

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx