I saw @VictorTaelin's tweet recently on increasing the effective context window for GPT-* by asking the LLM to compress a prompt which is then fed into another instance of the same model. This seemed like a neat trick, but in practice presents some issues; the compression can be lossy, crucial instructions can be lost, and less characters != less tokens. I set out to build a more usable version of