- Apache Spark is an open-source cluster computing framework for large-scale data processing. It was originally developed at the University of California, Berkeley in 2009 and is used for distributed tasks like data mining, streaming and machine learning. - Spark utilizes in-memory computing to optimize performance. It keeps data in memory across tasks to allow for faster analytics compared to dis