Hive is a data warehouse system build ontop of Hadoop. I’ve been experimenting with it for the past few days. Using the thrift service, I’ve been able to drive it from PHP. Here’s what I’ve done to get it going: Launching a Cluster Using the EC2 scripts, I launched a cluster of Hadoop servers on EC2. It’s straight forward to get up and running. It takes me about 5 minutes to get a cluster going, i