Summary: Neural networks have grown in scale over the past several years, and training can require a massive amount of data and computational resources. To provide the required amount of compute power, we scale models to dozens of GPUs using a technique common in high-performance computing (HPC) but underused in deep learning. This technique, the ring allreduce, reduces the amount of time spent co
![Bringing HPC Techniques to Deep Learning - Baidu Research](https://cdn-ak-scissors.b.st-hatena.com/image/square/a77795f6332d0722726d9774cbbff5fed3a77c59/height=288;version=1;width=512/http%3A%2F%2Fresearch.baidu.com%2Fwp-content%2Fuploads%2F2017%2F02%2FRingAllreduce.png)