サクサク読めて、アプリ限定の機能も多数!
トップへ戻る
大谷翔平
google-research.github.io
A Large Language Model That Can Speak and Listen |paper| Paul Rubenstein*, Chulayuth Asawaroengchai*, Duc Dung Nguyen*, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor
SoundStorm:Efficient Parallel Audio Generation [paper] Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi Google Research Abstract. We present SoundStorm, a model for efficient, non-autoregressive audio generation. SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional attention and confidence-based parallel decoding
MusicLM: Generating Music From Text |paper|dataset| Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank Google Research Abstract We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody bac
A Language Modeling Approach to Audio Generation |paper| blog post| Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour Google Research Abstract. We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence o
このページを最初にブックマークしてみませんか?
『google-research.github.io』の新着エントリーを見る
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く