Understanding the Mixture of Softmaxes (MoS) November 19, 2017 In this post we'll be pulling apart Breaking the Softmax Bottleneck: A High-Rank RNN Language Model by Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W. Cohen. The softmax operation is fundamentally important for many tasks in machine learning. The softmax allows you to produce a probability distribution over a set of class