Question spark - How to reduce the shuffle size of a JavaPairRDD? * I have a JavaPairRDD<Integer, Integer[]> on which I want to perform a groupByKey action. The groupByKey action gives me a: org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle which is practically an OutOfMemory error, if I am not mistaken. This occurs only in big datasets (in my case when