[B! hadoop] yamazのブックマーク

HDFS.pdf

yamaz 2011/12/01

HDFS explained as comics

hadoop

リンク

InfoQ: MapR社がHadoopをベースとした商用ディストリビューションをリリース

Spring BootによるAPIバックエンド構築実践ガイド第2版何千人もの開発者が、InfoQのミニブック「Practical Guide to Building an API Back End with Spring Boot」から、Spring Bootを使ったREST API構築の基礎を学んだ。この本では、出版時に新しくリリースされたバージョンである Spring Boot 2 を使用している。しかし、Spring Boot3が最近リリースされ、重要な変...

yamaz 2011/11/29

hadoop

リンク

Hadoopのトラブルシューティングに関する資料があったのでめもっとく - wyukawa's diary

Hadoop World 2011でClouderaの人が発表した資料を見つけたのではっておく。 Hadoop Troubleshooting 101 - Kate Ting - Cloudera View more presentations from Cloudera, Inc. Clouderaのサポートチームの極意が詰め込まれているようだ。内容的にはHadoop徹底入門の10章の「性能向上のためのチューニング」と若干かぶっているが参考になります。 io.sort.mb < mapred.child.java.opts　とすることとか（ていうかmapred.child.java.optsを増やすことはあるかもしれないがio.sort.mbっていじるもんなのかな）、プロセス数やファイルディスクリプタいじれとか、map出力のスレッドいじれとか、Jetty 6.1.26は使うなとか、盛り

yamaz 2011/11/12

hadoop

リンク

Hadoop輪読会第6章

5. 6.1 MapReduce ジョブの実行の内幕 MapReduce の実行やることは、 JobClient.runJob(conf) だけ！だけどその裏では様々なプロセスが動いている P169 6. 6.1 MapReduce ジョブの実行の内幕裏で動いている登場人物 jobClient jobtracker ジョブの実行管理。 JobTracker をメインクラスに持つ Java アプリケーション tasktracker ジョブを分割して出来たタスク実行。 TaskTracker をメインクラスに持つ Java アプリケーション分散 FS （ HDFS など）各プロセス間でのジョブのファイルを共有する為に使用するどのように実行されるか、ステップ毎に説明していきます。 7. MapReduce ジョブの実行遷移図 MapReduce プログラム JobClient Job

yamaz 2011/09/14

hadoop

リンク

AmazonS3 - HADOOP2 - Apache Software Foundation

S3 Support in Apache Hadoop Apache Hadoop ships with a connector to S3 called "S3A", with the url prefix "s3a:"; its previous connectors "s3", and "s3n" are deprecated and/or deleted from recent Hadoop versions. Consult the Latest Hadoop documentation for the specifics on using any the S3A connector. For Hadoop 2.x releases, the latest troubleshooting documentation. For Hadoop 3.x releases, the la

yamaz 2011/07/09

リンク

riccomini - hadoop pig documentation

blackberry, iphone, android sentiment analysis, string matching social networking, google app engine processing hadoop, aster data It is sometimes difficult for SQL users to learn Pig because their mind is used to working in SQL. In this tutorial, examples of various SQL statements are shown, and then translated into Pig statements. For more detailed documentation, please see the official Pig

yamaz 2011/02/12

SQLとPigLatinを比較したCookbook

hadoop
pig

リンク

HDFS Architecture Guide

yamaz 2010/12/17

hadoop

リンク

The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Table of contents 1 Introduction .......................................................................................................................3 2 Assumptions and Go

The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Table of contents 1 Introduction .......................................................................................................................3 2 Assumptions and Goals .....................................................................................................3 2.1 Hardware Failure .................

yamaz 2010/12/17

hadoop

リンク

tomo🐧@learning on Twitter: "アザース RT @yutuki_r: HDFSの基礎。 The Hadoop Distributed File System:Architecture and Design http://tinyurl.com/3a7y8r3 #hadoopreading"

アザース RT @yutuki_r: HDFSの基礎。 The Hadoop Distributed File System:Architecture and Design http://tinyurl.com/3a7y8r3 #hadoopreading

yamaz 2010/12/17

HDFSの基礎

hadoop

リンク

NTTデータのHadoop報告書がすごかった - 科学と非科学の迷宮

業界トップのエンタープライズ Hadoop 企業 Cloudera に入社しました http://www.cloudera.co.jp/ 今年の6月に、「平成２１年度産学連携ソフトウェア工学実践事業報告書」というドキュメント群が経産省から公表されました。そのうちの一つに、NTTデータに委託されたHadoopに関する実証実験の報告書がありましたので、今更ながら読んでみることにしました。 Hadoop界隈の人はもうみんなとっくに読んでるのかもしれませんけど。 http://www.meti.go.jp/policy/mono_info_service/joho/downloadfiles/2010software_research/clou_dist_software.pdf 「高信頼クラウド実現用ソフトウェア開発（分散制御処理技術等に係るデータセンター高信頼化に向けた実証事業）」という

yamaz 2010/09/29

hadoop

リンク

Cloudera Blog

In an era where artificial intelligence (AI) is reshaping enterprises across the globe—be it in healthcare, finance, or manufacturing—it’s hard to overstate the transf ormation that AI has had on businesses, regardless of industry or size. At Cloudera, we recognize the urgent need for bold steps to harness this potential and dramatically accelerate the time to […] Read blog post

yamaz 2009/12/24

hadoop

リンク

Tech Reports | EECS at UC Berkeley

Tyson Condie and Neil Conway and Peter Alvaro and Joseph M. Hellerstein and Khaled Elmeleegy and Russell Sears EECS Department, University of California, Berkeley Technical Report No. UCB/EECS-2009-136 October 9, 2009 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.pdf MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault toleranc

yamaz 2009/11/07

リンク

Cloudera Blog

It’s hard to believe it’s been 15 years since the global financial crisis of 2007/2008. While this might be a blast from the past we’d rather leave in the proverbial rear-view mirror, in March of 2023 we were back to the future with the collapse of Silicon Valley Bank (SVB), the largest US bank to […] Read blog post

yamaz 2009/09/12

うーん．Split情報を与えないといけない時点でいまいちな印象.

hadoop

リンク

Mass-scale computing: Why Hadoop is hot but Java is not

With the massive amount of data proliferating the Web, companies such as Google and many others are building new techno logies to sort it all. Core to that movement is something called MapReduce, a software technique that breaks down huge amounts of data into smaller bits. Operating on the smaller bits, and then piecing results together to form the big picture again has proven extremely successful.

yamaz 2009/08/21

hadoop

リンク

Welcome to Apache Pig!

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. At the present time, Pig's infrastructure l