In this post, I describe how to parallelize computations in Ruby with ruby-spark gem. This library uses a Apache Spark project to storing and distributing data collections across the cluster. Requirments: Java 7+ Ruby 2+ wget or curl MRI or JRuby Glossary: Context: entry point for using Spark functionality RDD: Resilient Distributed Dataset Driver: a driver Spark instance (exist only once) Executo