In this post, we’ll share how deep learning research usually proceeds, describe the infrastructure choices we’ve made to support it, and open-source kubernetes-ec2-autoscaler, a batch-optimized scaling manager for Kubernetes. We hope you find this post useful in building your own deep learning infrastructure. A typical deep learning advance starts out as an idea, which you test on a small problem.