May 15, 2014 Yesterday I wanted to start learning about how HDFS (the Hadoop Distributed File System) works internally. I knew that It’s distributed, so one file may be stored across many different machines There’s a namenode, which keeps track of where all the files are stored There are data nodes, which contain the actual file data But I wasn’t quite sure how to get started! I knew how to naviga