Now I want to add a complete cost-based optimization for hive. but when I begin the work, I found it very difficult to do using current hive optimization framework. The current code of hive, optimizations are all done after generating DAG of operators. It is a awful design and makes me mad. For example, the map-side optimization, it scans the whole operators' DAG and try to find the operators that
The goal is to run all TPC-H (http://www.tpc.org/tpch/) benchmark queries on Hive for two reasons. First, through those queries, we would like to find the new features that we need to put into Hive so that Hive supports common SQL queries. Second, we would like to measure the performance of Hive to find out what Hive is not good at. We can then improve Hive based on those information. For queries
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く