Neural Information Processing Systems Workshop on Leaning on Cores, Clusters, and Clouds (2010) For large data it can be very time consuming to run gradient based optimizat ion,for example to minimize the log-likelihood for maximum entropy models.Distributed methods are therefore appealing and a number of distributed gradientoptimization strategies have been proposed including: distributed gradien