[B! hadoop] suu-gのブックマーク

Hadoopの勉強会をやった

Hadoopは正直専門外、というか12月に研修を受けるまで、触ったことすらなかったのよね。おかげで資料作りには、丸2日ぐらいかかった。こんなに資料作りに時間をかけたのは久しぶりな気がする。間違いとかもあるかもなので、あったら教えてくださいな。それにしても、IPv6はHadoopでもいらない子なんだねえ。大量のサーバと潤沢なアドレス空間の相性は良さそうなのに。

suu-g 2015/01/08

hadoop

リンク

Hadoop0.23 YARNの最小限サンプルを書いてみた - ひしだまの変更履歴

ひしだまＨＰの更新履歴。主にＴＲＰＧリプレイの元ネタ集、プログラミング技術メモと自作ソフト、好きなゲームや音楽です。 Hadoop0.23.0が出たので、噂のMapReduce2.0であるYARNの最小限サンプルを書いてみた。いやー、しかしYARNを甘く見てた＾＾； YARNはもうMapReduceとは（直接は）関係ないので、MapReduce2.0とは呼ばない方がいいね。従来のHadoopなら、「MapReduceのプログラムを書くよ」と言ったら、MapperやReducerを実装するのをイメージすると思う。そして、JobTrackerやTaskTrackerが自動的にタスク分割して実行してくれる。障害が発生したら別ノードで再実行してくれるし。しかし「YARNのプログラムを書くよ」というのは、JobTrackerやTaskTrackerや障害対応をコーディングする事に相当する！

suu-g 2011/11/23

hadoop
yarn

リンク

Hadoop0.23 YARNメモ(Hishidama's Hadoop0.23 YARN Memo)

概要 YARNは、Hadoop0.23におけるジョブ実行フレームワークの名前。 0.23より前のHadoopはMapReduceというアルゴリズム（に基づくフレームワーク）だったので、次世代MapReduceという意味でMapReduce2.0（MRv2）とも呼ばれているが、実際はもうMapReduceではないので、別の名前が付けられたのだろう。 YARNでは、以下のような手順でアプリケーションを実行する。（ResourceManager（RM）とかApplicationMaster（App Mstr・AM）とかの関係については、YARN Architectureを参照） ClientがResourceManagerに対してプログラム（ApplicationMaster）の実行を依頼（submit）する。 ResourceManagerは、どこかのノードでApplicationMaste

suu-g 2011/11/23

hadoop
yarn

リンク

MapReduce以外の分散処理基盤BSP, Piccolo, Sparkの紹介 - Preferred Networks Research & Development

どうも，実は今年から開発チームにjoinしていた中川です．可愛い犬の写真がなかったので，可愛いマスコットの画像を貼っておきます．最近MapReduceとかその実装であるHadoopとかをよく聞くようになりました．これはつまり，それだけ大量のデータをなんとか処理したいという要望があるからだと思います．しかし当たり前ですが，MapReduceは銀の弾丸ではありません．ということで，最近気になっているMapReduceとは違ったアプローチを取っている分散処理基盤について，社内のTechTalkで話した内容を簡単にまとめて紹介したいと思います． Bulk Sychronous Parallel このアルゴリズム自体は1990年に誕生したものです．長いのでBSPと書きます．さて，グラフから最短経路を求める時，MapReduceは使えるでしょうか？このような論文が出るくらいですから出来ないことはあ

suu-g 2011/11/23

リンク

NTTデータのHadoop報告書を読んでみた - wyukawa's diary

NTTデータのHadoop報告書がすごかった - 科学と非科学の迷宮これで話題になっていたのは知っていたけど仕事と関係無かったこともあり今まで読んでなかったんですが、１か月ほど前からHadoop仕事を始めたこともあり読んでみました。ま、現状はNTTデータから仕事もらっている立場だし提灯記事でも書こうかとw 目次はこんな感じになってます。で、全部で375ページもあるわけですが、アプリ開発者がとりあえず読むなら２章です。もうちょっと突っ込むなら関連する８章もプラスして読むといいでしょう。どうでもいいけど印刷して読んだほうがいいかも。僕はiPadで読みましたが２章は割とページをいったりきたりしたので。２章では渋滞解析アプリケーションを事例としてMapReduceアプリをどのように設計して、実装するのかが記述されていてとても参考になります。というかこれだけまとまった情報は象本にもHadoo

suu-g 2011/06/12

hadoop

リンク

IBMのHadoopディストリビューションとHadoop入門ドキュメント | Unofficial DB2 BLOG

IBMは結構Hadoopに力を入れています。例えば以下で独自のHadoopのディストリビューションを配布しています。（ディストリビューションを配布ってちょっと変な言い回しですが） - alphaWorks : The IBM Distribution of Apache Hadoop : Overview The IBM Distribution of Apache Hadoop is a joint project between the IBM Software Group Emerging Techno logy team and the Information Management analytics development team. 内容はApache Hadoopに独自のインストーラーとIBM JDK for Linuxを組み合わせたもののようですね。FAQに以下の記述があり

suu-g 2010/08/18

hadoop

リンク

Design patterns for efficient graph algorithms in MapReduce

Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland, College Park {jimmylin,mschatz}@umd.edu ABSTRACT Graphs are analyzed in many important contexts, includ- ing ranking search results based on the hyperlink struc- ture of the world wide web, module detection of protein- protein interaction networks, and privacy analysis of social network

suu-g 2010/07/18

リンク

Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010

GraphChi (Michael Leznik, Head of BI - London, King) GraphChi, a disk-based system for computing efﬁciently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a singl

suu-g 2010/07/18

リンク

HadoopSummit2010の各自のレビューのまとめですか

Agile Cat @Agile_Cat Hadoop モテモテｗ RT @xxkickerxx: 私もー見ますー。RT @Agile_Cat: あとで見ます～～～♪ RT @ryu_kobayashi: Hadoop2010: Hadoop Security in Detail. http://goo.gl/le4w 2010-07-17 16:12:55 御徒町@Serializable @okachimachiorz HadoopSecurity 0.20系　①ケルベロス導入　②HDFSへのアクセス（ticket）　③MRのACLの設定　独立ユーザーがそれぞれにセキュアにHadoopを利用するようにしている。http://bit.ly/93T45E @myen 2010-07-17 17:04:15

suu-g 2010/07/18

リンク

XXL Graph Algorithms__HadoopSummit2010

This document summarizes two graph algorithms for analyzing large graphs: connected components and clustering coefficient. For connected components, it describes a two step approach: 1) partition the graph and summarize connectivity on each partition, reducing data size, and 2) recombine the summaries to find the overall connected components. This approach works for other probl ems like finding min

suu-g 2010/07/18

リンク

Hypertable vs. HBase Performance Evaluation Test Setup

suu-g 2010/06/25

hadoop

リンク

HDFSのスケーラビリティ

Either em ail addresses are anonymous for this group or you need the view member em ail addresses permission to view the original message 皆さま本日、Yahoo!からHDFSを大規模な環境(数千～数万ノード)でHDFSをdeploy した際に、どのような問題があるかという情報が色々書かれた記事/blogがポストされました。 - HDFS Scalability (PDF注意) -- http://www.usenix.org/publications/login/2010-04/openpdfs/shvachko.pdf - Scalability of the Hadoop Distributed File System -- http://devel

suu-g 2010/06/17

hadoop

リンク

银娱优越会·(中国)网站登陆

银娱优越会·(中国)网站登陆 404 Not Found 网站银娱优越会-公司简介-公司产品-新闻动态-银娱geg优越会7171156-留言板温馨提示：您可能输入了错误的网址或页面已被删除或移动！ XML 地图 | Sit emap 地图

suu-g 2010/05/25

hadoop

リンク

Data-Intensive Information Processing Applications (Spring 2010) | Home

Data-Intensive Information Processing Applications (Spring 2010) Course: INFM718G/CMSC838G Time: Tuesday, 2:00-4:45pm Location: HBK 2119 Instructors: Jimmy Lin, () and Nitin Madnani, () This course is about scala ble approaches to processing large amounts of information (terabytes and even petabytes). We focus mostly on MapReduce, which is presently the most accessible and practical means of co

suu-g 2010/05/16

hadoop

リンク

Hadoop入門とクラウド利用

EDF2012 Kostas Tzouma - Linking and analyzing bigdata - Stratosphere

suu-g 2010/05/16

Data Miningへの利用

hadoop

リンク

Database dump progress

If you are reading this on Wikimedia servers, please note that we have rate limited downloaders and we are capping the number of per-ip connections to 2. This will help to ensure that everyone can access the files with reasonable download times. Clients that try to evade these limits may be blocked. Our mirror sites do not have this cap. Data downloads The Wikimedia Foundation is requesting help t

suu-g 2010/05/11

mediawiki data

リンク

HBase 基礎文法最速マスター - Stay Hungry. Stay Foolish.

基礎文法最速マスターが流行のようなので、便乗して勉強がてらにHBaseの基本操作について纏めてみます。 Perl基礎文法最速マスター - Perl入門ゼミはてな的プログラミング言語人気ランキング - Life like a clown これを読めばGoogleのBigTableのクローンであるHBaseの基本操作について何となく理解できるかも？です。他の基礎文法最速マスターと同じように簡易リファレンスを兼ねていますので足りない部分をあればご指摘ください。 HBaseは2010-02-01時点で最新のHBase0.20.3を対象としています。インストール方法については前記事を参照ください。 Cygwinを利用してWindowsにHBaseをインストール - Stay Hungry. Stay Foolish. 対話式シェルの実行基本 HBaseではHBase Shellという対話式

suu-g 2010/05/11

hadoop
hbase

リンク

CassandraとHadoop Hbaseなどの性能比較

Cassandra versus HBase performance study の結果が興味深かったので、メモ。 Yahoo.incの社員が、以下のcloud serviceについてベンチマークをとったようです。 cassandra 0.4 and 0.5 MySQL Hbase Sherpa 結果については、PDFの通りだけど、個人的な感想を書いちゃうと以下の通り cassandraはリアルタイム処理系としてはいけてるんじゃないかバージョンが上がることで性能がかなり上がっているので将来性がありそう MySQLベースのものはスケールしにくい？現状の話、今後改善されるのかも Hbaseはリアルタイム系処理を想定されて作られているのか？ cassandraとHbaseを同じ土俵で比べない方がいい気がする自分が言うリアルタイム系処理、とは、多くのユーザに同時にアクセスされ、そのリク

suu-g 2010/05/10

hadoop
HBase

リンク

cassandra-user@incubator.apache.org - Cassandra versus HBase performance study

Hi folks, We have been conducting a performance study comparing Cassandra and HBase (and Yahoo! PNUTS and MySQL) on identical hardware under identical workloads. Our focus has been on serving workloads (e.g. read and write individual records, rather than scan a whole table for MapReduce.) This is part of a larger effort to develop a benchmark for these kinds of systems (which we are calling YCSB,

suu-g 2010/05/10

パフォーマンス計測

hadoop
HBase

リンク

はてなブックマーク

タグ

関連タグで絞り込む (8)

hadoopに関するsuu-gのブックマーク (37)

お知らせ

今週のはてなブックマーク数ランキング（2024年7月第3週）

今週のはてなブックマーク数ランキング（2024年7月第2週）

はてなブックマーク透明性レポート（2024年 2月-2024年4月）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス