Most large language models (LLM) are too big to be fine-tuned on consumer hardware. For instance, to fine-tune a 65 billion parameters model we need more than 780 Gb of GPU memory. This is equivalent to ten A100 80 Gb GPUs. In other words, you would need cloud computing to fine-tune your models. Now, with QLoRa (Dettmers et al., 2023), you could do it with only one A100. In this blog post, I will