Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop. const int n = 100000; for (int j = 0; j < n; j++) { a1[j] += b1[j]; c1[j] += d1[j]; } This loop is executed 10,000 times via another outer for loop. To speed it up, I changed the code to: for (int j = 0; j < n; j++) { a1[j] += b1[j]; } for (int j = 0; j < n; j++) { c1[j] += d1[j]; } Compiled on Micr
![Why are elementwise additions much faster in separate loops than in a combined loop?](https://cdn-ak-scissors.b.st-hatena.com/image/square/98d6f053a97a87156775f60757c60865d0f2c47d/height=288;version=1;width=512/https%3A%2F%2Fcdn.sstatic.net%2FSites%2Fstackoverflow%2FImg%2Fapple-touch-icon%402.png%3Fv%3D73d79a89bded)