Today, we’re releasing LLaMA-2-7B-32K, a 32K context model built using Position Interpolation and Together AI’s data recipe and system optimizations, including FlashAttention-2. Fine-tune the model for targeted, long-context tasks—such as multi-document understanding, summarization, and QA—and run inference and fine-tune on 32K context with up to 3x speedup. LLaMA-2-7B-32K making completions of a
![Preparing for the era of 32K context: Early learnings and explorations](https://cdn-ak-scissors.b.st-hatena.com/image/square/3d9ece4eef602e6496170d2cf0696fade552d9fd/height=288;version=1;width=512/https%3A%2F%2Fassets-global.website-files.com%2F650c3b59079d92475f37b68f%2F653ba10a85071d68a6d90cf0_653000c390ad007ceff69488_long-llama.png)