Author and Project: Author: XAVIER CAPDEPON Xavier was a student of the Data Science Bootcamp#2 (B002) - Data Science, Data Mining and Machine Learning - from June 1st to August 24th 2015. Teachers: Andrew, Bryan, Jason, Sam, Vivian. The post is based on his “Spark, Hadoop and Parallel computing” project final submission. The entire code in Python, Spark and R for the project is available here. I.