Many people have asked me to describe the best practices that we have adopted to run a multi PB data warehouse using Hadoop. Most of the details were described in a paper that we presented at SIGMOD 2010. This document refers to our state-of-affairs as it was about a year back, but is still an interesting read. Below is the abstract of this paper. You can find the complete paper here. ABSTRACTScal