BackgroundIn 2009 I first started playing around with Hive and EC2/S3. I was blown away by the potential of the cloud. But it bothered me that the burden of sizing the cluster was put on the user. How would an analyst know how many machines were required for a given query or job? To make it worse – one had to even decide whether to add map-reduce or HDFS nodes. Within a single user session – diffe