[B! Avro] wlbhiroのブックマーク

wlbhiro id:wlbhiro

Avroに関するwlbhiroのブックマーク (11)

GitHub - confluentinc/confluent-kafka-python: Confluent's Kafka Python Client
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
wlbhiro 2018/07/20
Confluent

confluent-kafka-python

Python

Kafka

Avro

producer

consumer
リンク
Why Avro for Kafka Data? | Confluent
wlbhiro 2018/05/08
Avroを使う理由

Avro

Confluent

Confluent Platform
リンク
独断と偏見で選ぶHDFSのファイル形式 - サナギわさわさ.json
HDFSのファイル形式を何にすべきか、というのはRPGの主人公の名前を何にすべきか、と同じぐらい皆さん悩まれるかと思います。ご多分に漏れず僕も悩みましたので、調べた事をまとめておきます。なお先に結論だけ言っておきますと、大体のケースではORCをZlib圧縮して使っておけば良いんじゃないかなと考えています。マサカリは歓迎です。 ※201701/21追記 EMR5.0以降ではHive + ORCで遅くなるケースがあるとのアドバイスをAWSのサポートの方から伺いました。EMRを使っている方はParquetとの速度比較をしてみたほうが良いかもしれません。ファイル形式の候補ファイル形式の候補としては大体以下が挙げられます。 ORC Apache Parquet Apache Avro SequenceFile TextFile 各形式の特徴それぞれのファイル形式の詳細な説明はここではせず、
wlbhiro 2017/05/12
HDFS

Hive

ORC

Parquet

Avro

SEQUENCEFILE

TEXT

比較

Compare
リンク
Putting Apache Kafka To Use: A Practical Guide to Building an Event Streaming Platform (Part 2) | Confluent
This is the second part of our guide on streaming data and Apache Kafka. In part one I talked about the uses for real-time data streams and explained the concept of an event streaming platform. The rem ainder of this guide will contain specific advice on how to go about building an event streaming platform in your organization. This advice is drawn from our experience building and implementing Kafk
wlbhiro 2017/03/10
Kafka

Avro
リンク
File Format Benchmark - Avro, JSON, ORC & Parquet
This document summarizes a benchmark study of file formats for Hadoop, including Avro, JSON, ORC, and Parquet. It found that ORC with zlib compression generally performed best for full table scans. However, Avro with Snappy compression worked better for datasets with many shared strings. The document recommends experimenting with the benchmarks, as performance can vary based on data characteristic
wlbhiro 2016/12/30
Parquet

JSON

ORC

Avro

format

Hadoop

Spark

Compression

Hive
リンク
org.apache.avro.SchemaBuilder Java Exaples
wlbhiro 2016/12/04
Avro

Schema

create
リンク
AvroSerDe - Apache Hive - Apache Software Foundation
Overview – Working with Avro from HiveThe AvroSerde allows users to read or write Avro data as Hive tables. The AvroSerde's bullet points: Infers the schema of the Hive table from the Avro schema. Starting in Hive 0.14, the Avro schema can be inferred from the Hive table schema.Reads all Avro files within a table against a specified schema, taking advantage of Avro's backwards compatibility abilit
wlbhiro 2016/11/30
Hive

Avro
リンク
2010 年度上半期研究課題レポート Hive を用いたログ解析システムの構築 2010 年 3 月新規開発局システムクリエイティブグループ福田一郎概要 Hive は Apache Hadoop プロジェクトのサブプロ��
wlbhiro 2016/08/03
Avro

Hadoop

CyberAgent

MR
リンク
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet Hadoop Summit June 2016 The landscape for storing your big data is quite complex, with several competing formats and different implementations of each format. Understanding your use of the data is critical for picking the format. Depending on your use case, the different formats perform very differently. Although you can use a hammer to drive a s
wlbhiro 2016/08/03
Hadoop

HDP

HortonWorks

Compare

Avro

JSON

ORCFILE

ORC

Parquet
リンク
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon 2015
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon 2015 At the StampedeCon 2015 Big Data Conference: Picking your distribution and platform is just the first decision of many you need to make in order to create a successful data ecosystem. In addition to things like replication factor and node configuration, the choice of file format can have a profound impact on cluster
wlbhiro 2016/08/03
パフォーマンス・速度比較・特徴比較

Hive

SEQUENCEFILE

Avro

performance
リンク
Hiveのパフォーマンスチューニングで試した７つのこと - Qiita
Spark, SQL on Hadoop etc. Advent Calendar 2014 - Qiita 10日目の記事です。とあるプロジェクトにて、パフォーマンスチューニングのために実施した７つのことをまとめました。この内容はCloudera World Tokyo 2014でお話しさせていただいた内容を再編したものです。登壇資料 - Hadoopで作る広告分析プラットフォーム登壇の様子 - 国内最大級のHadoop関連カンファレンスに登壇してきました！ 1.YARNが利用可能なリソースの変更 YARNではMR1と異なりスロットではなくコンテナという概念でリソースが管理されます。以下のパラメータでノードマネージャがコンテナに利用可能なメモリ量、CPU数を変更しました。 yarn.nodemanager.resource.memory-mb yarn.nodemanager.
wlbhiro 2016/07/19
Hive

Hadoop

type

Avro

SEQUENCEFILE

RCFile

Parquet

YARN
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx