[B! apacheArrow][columnarFileFormat] manboubirdのブックマーク

manboubird id:manboubird

apacheArrowとcolumnarFileFormatに関するmanboubirdのブックマーク (2)

Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation?
Apache Parquet and Apache ORC have become a popular file formats for storing data in the Hadoop ecosystem. Their primary value proposition revolves around their “columnar data representation format”. To quickly explain what this means: many people model their data in a set of two dimensional tables where each row corresponds to an entity, and each column an attribute about that entity. However, st
manboubird 2018/06/12
comparison

apacheArrow

parquet

orcFile

columnarFileFormat
リンク
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Analytics
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Analytics 1) Columnar formats like Parquet, Kudu and Arrow provide more efficient data storage and querying by organizing data by column rather than row. 2) Parquet provides an immutable columnar format well-suited for storage, while Kudu allows for mutable updates but is optimized for scans. Arrow provides an in-memory colu
manboubird 2016/10/29
slide

columnarFileFormat

kudo

parquet

apacheArrow
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx