[B! hadoop] imai-factoryのブックマーク

Logをs3とredshiftに格納する仕組み

Ken MorishitaData Scientist, Machine Learning Engineer, Software Developer

imai-factory 2013/05/23

リンク

Hive - External Table With Partitions

By default, when data file is loaded, /user/${USER}/warehouse/user is created automatically. For me, it's /user/chris/warehouse/user, user is the table name, data files of user table are all located in this folder. Now, we can freely to use SQLs to analyze the data. What if What if we want to process the data by some ETL programs, and load the result data to hive, but we don't want to load them ma

imai-factory 2013/01/17

hadoop
hive

リンク

GedowFatherさんがImpalaを本番投入した件

外道父 | Noko @GedowFather Impalaを本番環境にぶっ込んでやった。ありがちな集計クエリで10倍、ただのカウントで30倍の処理時間差を計測 2012-11-16 11:25:12 外道父 | Noko @GedowFather Impala検証利用データ：17MB, 45,000行で GROUP & ORDER BY が Hive 63s : Impala 7s。COUNTで Hive 34s : Impala 1s 2012-11-16 11:28:33

imai-factory 2012/11/20

すでに本番ですか！

リンク

GitHub - fluent/fluent-plugin-webhdfs: Hadoop WebHDFS output plugin for Fluentd

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

imai-factory 2012/11/08

hadoop

リンク

Cloudera Impala がリリースされました - 科学と非科学の迷宮

(2012/10/25 15:48 追記) Cloudera 公式ブログで Impala についての紹介記事を掲載しました。このブログ記事の完全上位互換なのでそちらの記事をご参照ください。 Cloudera Impala：Apache Hadoopで実現する、真のリアルタイムクエリ | Hadoopとビッグデータソリューションのリーディングカンパニー | Cloudera Japan Cloudera から、データサイエンティストのためのリアルタイムクエリエンジン「Impala」がリリースされました。Hive と完全互換のクエリ言語で、Hive より10倍以上速くクエリを処理できます。概要及びダウンロードはこちらから! http://www.cloudera.com/content/cloudera/en/products/cloudera-enterprise-core/clouder

imai-factory 2012/10/27

hadoop

リンク

分割可能なLZO圧縮をhadoopで使う

Twitterでは基本的にファイルはLZO圧縮しているようで， 3,4倍のストレージの節約分割可能 CPUは少ししか使わない IOバウンドのジョブは3,4倍の性能向上などのメリットがあると言っています．これは使わない手はないということで試してみました． clouderaのこのブログ記事を参考にして進めます． code.google.com/p/hadoop-gpl-compressionもありますが，Twitterが公開している分割可能なのを使います． http://github.com/kevinweil/hadoop-lzo 今回の環境はclouderaのamiをベースにしました． cloudera-ec2-hadoop-images/cloudera-hadoop-fedora-20090623-x86_64 ami-2359bf4 CDH3で，hadopoのバージョンは