About a year ago I attended the Paris Hadoop Users Group (HUG) to listen to Marcel Kornacker discuss Impala, Cloudera’s version of Dremel and along with it a new HDFS file format, Parquet. The talk was great–Impala promised great improvements in query times on HDFS and Parquet was to be its native storage format but actually be execution engine agnostic, meaning that in theory it would be easy to