This document discusses how to collect big data into Hadoop using Apache Flume and Fluentd. It describes some problems with a poor man's approach to data collection and discusses the basic theories of divide and conquer and streaming to make data collection more efficient. It then provides an overview of how Apache Flume and Fluentd work, including their network topologies, configurations, and plu