TL; DR 文のトークン化のためのライブラリである konoha の紹介をします. (旧 tiny_tokenizer) ↓みたいな感じで使えます.なにとぞ〜 from konoha import WordTokenizer sentence = '自然言語処理を勉強しています' tokenizer = WordTokenizer('MeCab') print(tokenizer.tokenize(sentence)) # -> [自然, 言語, 処理, を, 勉強, し, て, い, ます] tokenizer = WordTokenizer('Kytea') print(tokenizer.tokenize(sentence)) # -> [自然, 言語, 処理, を, 勉強, し, て, い, ま, す] tokenizer = WordTokenizer('Sentencepie
![トークナイザをいい感じに切り替えるライブラリ konoha を作った - Qiita](https://cdn-ak-scissors.b.st-hatena.com/image/square/2a123696282824156b0d56c18bc305d6569ecc8d/height=288;version=1;width=512/https%3A%2F%2Fqiita-user-contents.imgix.net%2Fhttps%253A%252F%252Fcdn.qiita.com%252Fassets%252Fpublic%252Farticle-ogp-background-9f5428127621718a910c8b63951390ad.png%3Fixlib%3Drb-4.0.0%26w%3D1200%26mark64%3DaHR0cHM6Ly9xaWl0YS11c2VyLWNvbnRlbnRzLmltZ2l4Lm5ldC9-dGV4dD9peGxpYj1yYi00LjAuMCZ3PTkxNiZ0eHQ9JUUzJTgzJTg4JUUzJTgzJUJDJUUzJTgyJUFGJUUzJTgzJThBJUUzJTgyJUE0JUUzJTgyJUI2JUUzJTgyJTkyJUUzJTgxJTg0JUUzJTgxJTg0JUU2JTg0JTlGJUUzJTgxJTk4JUUzJTgxJUFCJUU1JTg4JTg3JUUzJTgyJThBJUU2JTlCJUJGJUUzJTgxJTg4JUUzJTgyJThCJUUzJTgzJUE5JUUzJTgyJUE0JUUzJTgzJTk2JUUzJTgzJUE5JUUzJTgzJUFBJTIwa29ub2hhJTIwJUUzJTgyJTkyJUU0JUJEJTlDJUUzJTgxJUEzJUUzJTgxJTlGJnR4dC1jb2xvcj0lMjMyMTIxMjEmdHh0LWZvbnQ9SGlyYWdpbm8lMjBTYW5zJTIwVzYmdHh0LXNpemU9NTYmdHh0LWNsaXA9ZWxsaXBzaXMmdHh0LWFsaWduPWxlZnQlMkN0b3Amcz0wOTMxN2IxNzdiNGUxNGJlNzc4MDJiMGJiZjBkNmYyMQ%26mark-x%3D142%26mark-y%3D112%26blend64%3DaHR0cHM6Ly9xaWl0YS11c2VyLWNvbnRlbnRzLmltZ2l4Lm5ldC9-dGV4dD9peGxpYj1yYi00LjAuMCZ3PTYxNiZ0eHQ9JTQwa2xpcyZ0eHQtY29sb3I9JTIzMjEyMTIxJnR4dC1mb250PUhpcmFnaW5vJTIwU2FucyUyMFc2JnR4dC1zaXplPTM2JnR4dC1hbGlnbj1sZWZ0JTJDdG9wJnM9MTI1NjFkN2UzNjZiY2FhZmM1ZWYyNDhmZjNjMDQzYzg%26blend-x%3D142%26blend-y%3D491%26blend-mode%3Dnormal%26s%3Df3776d1b8c606b49c222de6892084b37)