Publications Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion We present a machine learning technique for driving 3D facial animation by audio input in real time and with low latency. Our deep neural network learns a mapping from input waveforms to the 3D vertex coordinates of a face model, and simultaneously discovers a compact, latent code that disambiguates the var