Over the past several years there's been a welcome trend in NLP projects to be broadly multi-lingual. However, even when many languages are supported, there's a few that tend to be left out. One of these is Japanese. Japanese is written without spaces, and deciding where one word ends and another begins is not trivial. While highly accurate tokenizers are available, they can be hard to use, and En