Chris Olston, Benjamin Reed, Adam Silberstein, and Utkarsh Srivastava at Yahoo Research had a USENIX 2008 paper, "Automatic Optimization of Parallel Dataflow Programs" (PDF), that looks at optimizations for large-scale Hadoop clusters using Pig. The paper says that it is only attempting "to suggest some jumping-off points for [optimization] work." With much of the paper spending a fair amount of t