*As of August, 2021 code is no longer maintained. It is preserved here in archival form for people who wish to continue to use it. 🎉 1T or bust my dudes 🎉 An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here to play with our pre-trained models, we strongly recommend you try out the HuggingFace Transformer integration. Training and inf