Follow @apachedatafu Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by the need for stable, well-tested libraries for data mining and statistics. It consists of two libraries: Apache DataFu Pig: a collection of user-defined functions for Apache Pig Apache DataFu Hourglass: an incremental processing framework for Apache Hadoop in Map