The Hadoop Distributed Filesystem (HDFS) is a distributed storage system for reliably storing petabytes of data on clusters of commodity hardware. This short paper examines the reliability of HDFS and makes recommendations for best practices to follow when running an HDFS installation. Overview of HDFS HDFS has three classes of node: a single name node, responsible for managing the filesys