Introducing StarCoder StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks. Similar to LLaMA, we trained a ~15B parameter model for 1 trillion tokens. We fine-tuned StarCoderBase model for 35B Python tokens, resulting in a new model th
![StarCoder: A State-of-the-Art LLM for Code](https://cdn-ak-scissors.b.st-hatena.com/image/square/8e8d7e662ffa10234a75495ee5ef0cfbf1d9b36c/height=288;version=1;width=512/https%3A%2F%2Fhuggingface.co%2Fblog%2Fassets%2F141_starcoder%2Fstarcoder_thumbnail.png)