We consider the task of reconstructing an image of a person’s face from a short input audio segment of speech. We show several results of our method on VoxCeleb dataset. Our model takes only an audio waveform as input (the true faces are shown just for reference). Note that our goal is not to reconstruct an accurate image of the person, but rather to recover characteristic physical features that a