[B! hadoop] kiszkのブックマーク

Hadoop MapReduceで行列積を計算する（ケース２）（Dense Matrix Multiplication with Hadoop MapReduce: Case2) - tetsuya_odakaの日記

前回のログでは、Case1として行列積の演算プログラムを示した。しかしながら、5000行5000列の行列同士の演算に6時間以上の時間がかかってしまい、これでは「ビッグデータ」の探索的な分析では使えないだろう。これまで、再三引用している「エコノミスト誌（6/4号）」の分析では、変量の数は300であり、サンプルサイズは51であった（これについては、以前のログで述べた）から、オーダーとしては、Case1のプログラムでも間に合う可能性がある。しかしながら、同誌に掲載されている「Yahoo! JAPAN 景気指数」では60万語（60万変量）と、CIとの相関を調べている。 Case2では、Case1のプログラムを改良することにより、実行速度の向上をはかる。 Case1のスケーラビリティー評価のところでも述べたが、Case1の実行時間は「単純な算術演算の回数の増加」では説明できない。 MapRed

kiszk 2013/10/08

リンク

https://docs.cloudera.com/documentation/enterprise/latest.html

kiszk 2013/09/21

hadoop

リンク

Install and Run Hadoop YARN in 10 Easy Steps - Practical Cloud Computing

Preamble If you’re interested in playing with Apache Hadoop’s MRv2 (a.k.a. YARN), you’ve probably looked for ways to set it up on a single-node. On the Apache Hadoop Yarn Home Page, you will find instructions for setting up a Single Node cluster. Unfortunately, there are some pre-setup assumptions (e.g. that you have installed hadoop-common/hadoop-hdfs and exported various environment variables) t

kiszk 2013/09/21

hadoop

リンク

Apache Hadoop 3.4.0 – Hadoop Cluster Setup

General Overview Single Node Setup Cluster Setup Commands Reference FileSystem Shell Compatibility Specification Downstream Developer's Guide Admin Compatibility Guide Interface Classification FileSystem Specification Common CLI Mini Cluster Fair Call Queue Native Libraries Proxy User Rack Awareness Secure Mode Service Level Authorization HTTP Authentication Credential Provider API Hadoop KMS Trac

kiszk 2013/09/21

hadoop

リンク

Runtime error - Meta Search

kiszk 2013/09/21

hadoop

リンク

Open Source & Open Standards | Cloudera

Cloudera Data Platform The only hybrid data platform for modern data architectures with data anywhere.

kiszk 2011/08/01

hadoop

リンク

The Next Generation of Apache Hadoop MapReduce · Yahoo! Hadoop Blog

Overview In the Big Data business running fewer larger clusters is cheaper than running more small clusters. Larger clusters also process larger data sets and support more jobs and users. The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors the framework into a generic resource schedu

kiszk 2011/02/20

hadoop

リンク

Hadoop MapReduceプログラムを解剖する

オープンソース・ソフトウェア「Hadoop」のMapReduceジョブは、標準ではJavaで記述します（その他には、Pig、Hive、JAQLといったものがあります）。しかし、意外と初心者には分かりにくいと筆者は感じます。本記事では、MapReduceジョブのサンプルコードを使って、できる限り正しくコードの意味を理解し、MapReduceへの入り口を示したいと思います。 HadoopでMapReduceを記述するときに使うAPIが、0.19から0.20に変わるところで新しくなっています。実は、現時点でHadoopプロジェクト本体からでさえも、新APIを使ったサンプルが提示されていません。本記事では、新しいAPIで筆者が書き直したサンプルを使って解説しますので、このサンプルは0.19以前のHadoopでは動かないことに注意してください。この記事は、0.20.2を使って検証し、解説しています。

kiszk 2010/12/02

hadoop

リンク

NTTデータのHadoop報告書がすごかった - 科学と非科学の迷宮

業界トップのエンタープライズ Hadoop 企業 Cloudera に入社しました http://www.cloudera.co.jp/ 今年の6月に、「平成２１年度産学連携ソフトウェア工学実践事業報告書」というドキュメント群が経産省から公表されました。そのうちの一つに、NTTデータに委託されたHadoopに関する実証実験の報告書がありましたので、今更ながら読んでみることにしました。 Hadoop界隈の人はもうみんなとっくに読んでるのかもしれませんけど。 http://www.meti.go.jp/policy/mono_info_service/joho/downloadfiles/2010software_research/clou_dist_software.pdf 「高信頼クラウド実現用ソフトウェア開発（分散制御処理技術等に係るデータセンター高信頼化に向けた実証事業）」という

kiszk 2010/10/30

hadoop

リンク

Data Applications and Infrastructure at LinkedIn__HadoopSummit2010

[This is work presented at SIGMOD'13.] The use of large-scale data mining and machine learning has proliferated through the adoption of techno logies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product featu

kiszk 2010/08/24

リンク

Hadoop Summit 2010 - Agenda

Big Data and the Power of Hadoop [ video ] Blake Irving, Executive Vice President and Chief Products Officer, Yahoo!

kiszk 2010/08/24

リンク

LinkedInのデータ基盤

Spring BootによるAPIバックエンド構築実践ガイド第2版何千人もの開発者が、InfoQのミニブック「Practical Guide to Building an API Back End with Spring Boot」から、Spring Bootを使ったREST API構築の基礎を学んだ。この本では、出版時に新しくリリースされたバージョンである Spring Boot 2 を使用している。しかし、Spring Boot3が最近リリースされ、重要な変...

kiszk 2010/08/24

リンク

Cloud9: A Library for Hadoop

Cloud9 was designed to serve as both a teaching tool and to support research in text processing. It was used in "cloud computing" courses at the University of Maryland in Spring 2008 and Fall 2008. The library itself is available via anonymous Subversion checkout. Like Hadoop itself, Cloud9 is distributed under the Apache License. Starting Points Subversion access: https://subversion.u

kiszk 2010/01/23

hadoop

リンク

http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS

kiszk 2010/01/15

リンク

Practical Problem Solving with Apache Hadoop & Pig

Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...

kiszk 2009/12/30

hadoop

リンク

Cloudera Blog

It’s hard to believe it’s been 15 years since the global financial crisis of 2007/2008. While this might be a blast from the past we’d rather leave in the proverbial rear-view mirror, in March of 2023 we were back to the future with the collapse of Silicon Valley Bank (SVB), the largest US bank to […] Read blog post

kiszk 2009/12/29

hadoop

リンク

MapReduce and Parallel DBMSs: Friends or Foes? – Communications of the ACM

MapReduce complements DBMSs since databases are not designed for extract-transf orm-load tasks, a MapReduce specialty. The MapReduce7 (MR) paradigm has been hailed as a revolutionary new platform for large-scale, massively parallel data access.16 Some proponents claim the extreme scalability of MR will relegate relational database management systems (DBMS) to the status of legacy techno logy. At lea

kiszk 2009/12/22

リンク

Pasang Bola Online | Judi Bola Tanpa Blokir

<div class="at-above-post-homepage addthis_tool" data-url="http://hugjp.org/2021/07/salah-pilih-agen-sbobet-bisa-sebabkan-berbagai-kerugian/"></div>Sbobet tentunya menjadi salah satu server judi online paling diminati oleh orang-orang yang gemar bermain judi online. Sbobet online via sbobet mobile. Main Game Online Sbobet Mobile.

kiszk 2009/11/13

hadoop

リンク

Hadoopの最新動向を「Hadoop World:NY 2009」の資料から（後編）

Hadoopは、グーグルが大規模分散システムのために用いているMapReduceという技術を、オープンソースとして実現するために開発されたJavaベースのソフトウェア。クラウド対応のアプリケーションであり、数テラバイトにもおよぶ大容量のデータを高速かつ低コストに分析する方法として注目を集めています。後編では、10月2日にニューヨークで開催された「Hadoop World：NY 2009」の午後のセッションの資料に目を通し、興味深いポイントを紹介しましょう。午後は3トラックに分かれ30ものセッションが行われていました。この記事は「Hadoopの最新動向を「Hadoop World:NY 2009」の資料から（前編）」の続きです。午後のセッション資料からハイライトを紹介イェール大学のAzza Abouzeid氏とKamil Bajda-Pawlikowski氏は、Hadoopとパラレル

kiszk 2009/11/04

hadoop

リンク

Hadoopの最新動向を「Hadoop World:NY 2009」の資料から（前編）

Hadoopは、グーグルが大規模分散システムのために用いているMapReduceという技術を、オープンソースとして実現するために開発されたJavaベースのソフトウェアです。開発が始まったのは2005年頃で、当時Yahoo!に所属し現在はClouderaに所属するDoug Cutting氏が中心となって進めてきました。 Hadoopが実現するMapReduce処理とは、簡単にいえば大量のデータを小さく分割して多数のノードに割り当て（Map処理）、各ノードで処理を行ったらそれを集約して結果を出す（Reduce処理）、という分散処理の方法です。数テラバイトにもおよぶ大容量のデータを高速かつ低コストに分散処理する方法として注目を集めています。ニューヨークでHadoop Worldが開催されるそのHadoopのカンファレンス「Hadoop World：NY 2009」が10月2日にニューヨークで

kiszk 2009/11/04

hadoop

リンク

はてなブックマーク

タグ

関連タグで絞り込む (9)

hadoopに関するkiszkのブックマーク (67)

お知らせ

今週のはてなブックマーク数ランキング（2024年8月第1週）

月間はてなブックマーク数ランキング（2024年7月）

今週のはてなブックマーク数ランキング（2024年7月第4週）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス