[B! IR] fcicqのブックマーク

fcicq id:fcicq

IRに関するfcicqのブックマーク (11)

Efficient Query Processing Infrastructures
This document provides an overview of efficient query processing infrastructures for web search engines. It discusses how search engines use distributed architectures across many servers to efficiently process queries at large scale. It also describes how search engines employ various techniques like index compression, skipping, dynamic pruning, and learning to rank to efficiently evaluate queries
fcicq 2018/07/11
IR

presentation
リンク
Modern Information Retrieval - Home
Information about the Second Edition of the book on Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Published by Addison-Wesley-Longman
fcicq 2012/07/05
book

IR
リンク
大規模画像認識とその周辺
2. Contents }  大規模画像データで出来ることの例 }  一般物体認識の紹介 }  大規模化の流れと最近の手法について }  大規模一般物体認識コンペティション }  他分野との融合的領域など 3. 大規模画像データの時代 }  Webサービスへの画像投稿は日常の一部 }  Flickr： 60億枚の画像（2011年） }  Facebook: 毎年30億枚画像投稿 }  Youtube: 毎日約8年分の動画がアップロード }  何らかのメタ情報が付与される場合も多い }  タグ、コメント、EXIF、位置情報、・・・ }  これらの大量のデータを用いることで、従来考えられなかったさまざまなアプリケーションが登場している 4. 画像補完 }  Scene completion using millions of photographs [Hays et
fcicq 2012/04/16
have read

machinelearning

IR

presentation
リンク
[IR] 転置インデックスとtop-k query - tsubosakaの日記
転置インデックスから上位k件の文章を取ってくる手法について、知ってる範囲でまとめてみました。転置インデックスとTop k-query View more presentations from tsubosaka この辺の話は教科書だと Information Retrieval: Implementing and Evaluating Search Engines (MIT Press) 作者: Stefan Buettcher,Charles L. A. Clarke,Gordon V. Cormack出版社/メーカー: The MIT Press発売日: 2010/07/23メディア: ハードカバー購入: 2人クリック: 78回この商品を含むブログ (8件) を見るのChapter 5とかに疑似コードなども含め載っているので、参考になるかと思います。
fcicq 2012/02/13
have read. ranking with inverted index.

search

ir

presentation
リンク
Locality Sensitive Binary Codes for Shift Invaliant KernelsとSpectral Hashingの比較 - Yasuo Tabeiの日記
Locality Sensitive Hashing(LSH)とは、ベクトルとして表現されたデーターの集合を入力として、それらの２点間の距離を保存したまま、ハミング距離に基づく文字列の集合に射影する技術です。コサイン距離[1]、ユーグリッド距離[2]に基づくものや、機械学習法を応用した、semantic hashing[3], spectral hashing[4], kernelized LSH[5], その他[6][7][8]、現在までに多くの手法が提案されています。この背景には、Googleが、昔に提案されたLSHが、ニュース記事の推薦システムで使えることを示した[9]のきっかけに、現在、推薦システム、画像検索、文章のクラスタリング[10]など、色々なシステムや研究の場面で利用されています。理論的な収束の保証があるという意味で、オリジナルのコサイン距離ベース[1]の手法が良いのです
fcicq 2011/12/01
algorithms

IR

hash
リンク
http://www.cs.unc.edu/~lazebnik/fall09/large_scale_search.pptx
fcicq 2011/12/01
Locality Sensitive Hashing and Large Scale Image Search, Spectral Hashing

algorithms

presentation

IR

hash

search
リンク
Ivory: A Hadoop toolkit for web-scale information retrieval research
A Hadoop toolkit for web-scale information retrieval research Ivory is a Hadoop toolkit for web-scale information retrieval research that features a retrieval engine based on Markov Random Fields, appropriately named SMRF (Searching with Markov Random Fields). Ivory takes full advantage of the Hadoop distributed environment (the MapReduce programming model and the underlying distributed file syste
fcicq 2011/03/25
hadoop

ir
リンク
「第3回自然言語処理勉強会＠東京」でCSAについて発表します - EchizenBlog-Zwei
@nokunoさんの好意で「第3回自然言語処理勉強会＠東京」でCompressed Suffix Arrayについて発表させていただくことになりました。つきましては参考のため発表資料を以下に置いておきます。参加される方はもちろん、興味のある方はご覧になっていただけるとうれしいです。第3回自然言語処理勉強会＠東京 : ATND 第3回自然言語処理勉強会＠東京を開催します - nokunoの日記なお本資料は以下の皆様のアドバイスを頂きました。ありがとうございました(とくに@overlastさんには4-5時間もお付き合い頂きました。おかげさまでスライドの質が大幅アップしました。感謝)。 @overlastさん @tamago_donburiさん @tsubosakaさん @machyさん
fcicq 2011/03/19
presentation

nlp

algorithms

ir
リンク
boilerpipe - Project Hosting on Google Code
Code Archive Skip to content Google About Google Privacy Terms
fcicq 2011/02/27
in java

library

scraping

ir
リンク
Pattern | CLiPS
Pattern is a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization. The module is free, well-document
fcicq 2011/02/24
python

ir

nlp

library

dom
リンク
Large Scale Learning to Rankを読んだ - 射撃しつつ前転改
本当は三が日中にまともなエントリを1本ぐらいは書く予定だったのだが、ちょっと無理だった。というわけで、実質的に新年一本目のエントリです。Large Scale Learning to Rank (D. Sculley, NIPS Workshop on Advances in Ranking, 2009) (pdf) を読んだので、1本目のエントリとしてこの論文を紹介したい。では早速本題に入ろう。順位学習において、Pairwise Learningを単純に行うと、n^2の学習コストがかかる。これは計算時間としては厳しい部類に入る。そもそも順位学習ってなに、という人は、WWW2009のチュートリアル(pdf)とかを参照してください。 Bottouらは、SGDの一般化能力はデータセットのサイズに依らず、どれだけのstochastic stepを実行したかで決まると言う事を示した。そこで、Sc
fcicq 2011/02/14
PFI guys...

machinelearning

algorithms

IR
リンク
1