I saw @VictorTaelin's tweet recently on increasing the effective context window for GPT-* by asking the LLM to compress a prompt which is then fed into another instance of the same model. This seemed like a neat trick, but in practice presents some issues; the compression can be lossy, crucial instructions can be lost, and less characters != less tokens. I set out to build a more usable version of
![CompressGPT: Decrease Token Usage 70%](https://cdn-ak-scissors.b.st-hatena.com/image/square/61d41c7a7ab657b93975b7f0a38771638a46d0e6/height=288;version=1;width=512/https%3A%2F%2Fmusings.yasyf.com%2Fcontent%2Fimages%2F2023%2F04%2Fd2__1_-removebg-preview.png)