GPT-2B-001 ||| Model Description GPT-2B-001 is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 2B refers to the total trainable parameter count (2 Billion) [1, 2]. This model was trained on 1.1T tokens with NeMo. Model Architecture improvements The model uses the SwiGLU activation function [4] Rotary positional embeddings (R