At Etsy we run thousands of Hadoop jobs over hundreds of terabytes of data every day. When operating at this scale optimizing jobs is vital: we need to make sure that users get the results they need quickly, while also ensuring we use our cluster’s resources efficiently. Actually doing that optimizing is the hard part, however. To make accurate decisions you need measurements, and so we have cr