This tutorial: An encoder/decoder connected by attention. While this architecture is somewhat outdated, it is still a very useful project to work through to get a deeper understanding of sequence-to-sequence models and attention mechanisms (before going on to Transformers). This example assumes some knowledge of TensorFlow fundamentals below the level of a Keras layer: Working with tensors directl