Since the invention of SQL and relational databases, data production has been about specifying how data should be transformed through queries. While Apache Spark can certainly be used as a general distributed SQL-like query engine, the power and granularity of Spark’s APIs allows for a fundamentally different, and far more productive, approach. This session will introduce the principles of goal-ba