Sometimes humans can spot optimization opportunities that a compiler can’t doesn’t. In this post, we start with a loop generated from C code by clang, and tweak it in various ways, measuring the speedup. 📢 This post was on the front page of HN. You can join in the discussion there. Disclaimer: I’m not an optimization expert, by any means, in fact my expertise is in high-level, purely-functional l