Columnar storage is a popular technique to optimize analytical workloads in parallel RDBMs. The performance and compression benefits for storing and processing large amounts of data are well documented in academic literature as well as several commercial analytical databases. The goal is to keep I/O to a minimum by reading from a disk only the data required for the query. Using Parquet at Twitter,
Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers.Twitter / Photos Disclaimer: The opinions expressed here are my own and do not necessarily represent those of current or past employers. Twitter / Photos Henry Robinsonによる、カラムナストレージの解説記事を翻訳しました。カラムナストレージは、Googleで開発されたデータ処理ツールであるDremelに使用されているファイルフォーマットであり、Clouderaが開発を進めるImpalaでも採用
The document describes Dremel, an interactive analysis system for web-scale datasets. Dremel uses a columnar data storage model and tree-based query serving architecture to enable interactive analysis of trillion record datasets distributed across thousands of nodes. It provides an SQL-like interface and can process queries orders of magnitude faster than traditional MapReduce systems by avoiding
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く