[B! algorithm][google] j0hnのブックマーク

GitHub - google/diff-match-patch: Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

j0hn 2007/06/11

リンク

Technoblog: MapReduce for Ruby: Ridiculously Easy Distributed Programming

Ruby on Rails, Io, Lisp, JavaScript, Dynamic Languages, Prototype-based programming and more... Techno blog reader special: $10 off web hosting by FatCow! Wednesday, August 16, 2006 I am very happy to announce that Google's MapReduce is now available for Ruby (via gem install starfish). MapReduce is the technique used by Google to do monstrous distributed programming over 30 terabyte files. I have

j0hn 2006/08/22

リンク

初代Googleのアルゴリズム解説 - GIGAZINE

いまやネットの世界を左右する強力な検索エンジンとなったGoogle。日本ではまだYahoo！の方がはるかに利用者が多いのでさほどではないですが、アルゴリズムの基本的な考えが似ているため、同じような結果が出てきます。つまり、既存の検索エンジンのその基礎となった一番最初のGoogleの検索アルゴリズムを理解すれば、検索エンジン対策にも役立つはず。ということで、初代Googleのアルゴリズムをできるだけわかりやすく解説してみます。既存の他サイトの解説とは違い、きちんとした最初のGoogleの数式に基づいています。詳細は以下から。The Anatomy of a Search Engine http://www-db.stanford.edu/~backrub/google.html Googleの画期的なランク付けの方法が数式による全自動のページランクというのは聞いたことがあると思いますが、

j0hn 2006/04/12

リンク

GoogleのMapReduceはとても便利な技術 - llameradaの日記

GoogleのMapReduceはとても便利な技術である（使えないけど）。特に、ある単語（例えばGoogle）が出現した全てのテキスト・ファイル名を抜き出す際に便利だ。このタスクは、ファイル数が１万ならば簡単に解ける。ワン・ライナーで十分である。例えば、Rubyならばこんな感じだろう。 ruby -rfind -renumerator -e "Find.to_enum(:find, '/tmp/textdir/').each{|fn| \ File.file?(fn) and open(fn).read =~ /google/ and puts fn}" ところがファイル数が10億となった場合、このタスクはとたんに非常に難しいタスクとなる。それは並列処理が要求されるからである。1ファイル10KBとしても、10億のファイルのサイズは10TBとなる。これだけのサイズのデータを取り扱うには並列