Neural Discrete Representation Learning All samples on this page are from a VQ-VAE learned in an unsupervised way from unaligned data. More details in the paper. Reconstructions These samples are reconstructions from a VQ-VAE that compresses the audio input over 64x times into discrete latent codes (see figure below). Both the VQ-VAE and latent space are trained end-to-end without relying on phone