tl;drRNNS work great for text but convolutions can do it fasterAny part of a sentence can influence the semantics of a word. For that reason we want our network to see the entire input at onceGetting that big a receptive can make gradients vanish and our networks failWe can solve the vanishing gradient problem with DenseNets or Dilated ConvolutionsSometimes we need to generate text. We can use “de
![Convolutional Methods for Text](https://cdn-ak-scissors.b.st-hatena.com/image/square/f3d4eb2e3e73069365611dfac68ebf54dbf2f00f/height=288;version=1;width=512/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fda%3Atrue%2Fresize%3Afit%3A800%2F1%2AJkjF4jAZTpnbHHJJLkfNRQ.gif)