[B! tokenization][nlp] manboubirdのブックマーク

manboubird id:manboubird

tokenizationとnlpに関するmanboubirdのブックマーク (1)

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
- 2 users
- arxiv.org
- 学び
What are the units of text that we want to model? From bytes to multi-word expressions, text can be analyzed and generated at many granularities. Until recently, most natural language processing (NLP) models operated over words, treating those as discrete and atomic tokens, but starting with byte-pair encoding (BPE), subword-based approaches have become dominant in many areas, enabling small vocab
manboubird 2021/12/22
paper

ontology

nlp

knowledgeGraph

knowledgeBase

dictionary

tokenization
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx