[B! 全文検索] tomoemonのブックマーク

tomoemon id:tomoemon

全文検索に関するtomoemonのブックマーク (30)

AI search that understands
Algolia is in the 2024 Gartner® Magic Quadrant™ for Search and Product Discovery positioned furthest for Completeness of Vision. Learn more by downloading a copy of the report.
tomoemon 2018/05/09
全文検索
リンク
Building a complete Tweet index
Today, we are pleased to announce that Twitter now indexes every public Tweet since 2006. Since that first simple Tweet over eight years ago, hundreds of billions of Tweets have captured everyday human experiences and major historical events. Our search engine excelled at surfacing breaking news and events in real time, and our search index infrastructure reflected this strong em phasis on recency.
tomoemon 2014/11/19
全文検索

twitter
リンク
niconicoの検索を支えるElasticsearch // Speaker Deck
niconicoでのElasticsearch運用事例的なお話。第7回elasticsearch勉強会 #elasticsearch #elasticsearchjp elasticsearch.doorkeeper.jp/events/16837 報告ブログ記事 http://ch.nic…
tomoemon 2014/11/19
elasticsearch

niconico

全文検索
リンク
Download Microsoft Search Server 2010 Express from Official Microsoft Download Center
Download Microsoft Search Server 2010 Express from Official Microsoft Download Center
tomoemon 2013/07/14
全文検索

microsoft

windows
リンク
第5回　Rubyでサーバ要らずの高速全文検索！ - rroongaの紹介 | gihyo.jp
前回のMilkodeでの事例紹介では、Rubyでrroongaを使ってソースコード検索エンジンを実装している事例を紹介しました。Milkodeは全文検索エンジンを組み込むことにより、大量のファイルに対しても高速な検索を実現しています。rroongaを使った代表的なアプリケーションの1つです。プログラマにとってとても便利なアプリケーションなので、ぜひ使ってみてください。前回はユーザ視点からのrroongaの紹介でしたが、今回は違った角度から紹介します。rroongaの歴史、大事にしていることについて説明します。自分のアプリケーションで利用するプロダクトを検討するときに、プロダクトがどのような方向で作られているかを考慮していますか？自分のアプリケーションが大事にしたいことをそのプロダクトでも大事にしているなら、相性がよいかもしれません。さて、rroongaはあなたが大事にしたいことを大事
tomoemon 2013/06/05
全文検索

ruby
リンク
Solr vs ElasticSearch - minghaiの日記
Sematextのブログにて連載された"Solr vs ElasticSearch"の翻訳。現在、Part 6まで存在し、その全てを翻訳した。 Part 1 – 概観 Part 2 – インデックス作成と言語の取扱 Part 3 – 検索 Part 4 – Faceting Part 5 - 管理APIの機能 Part 6 – ユーザと開発者のコミュニティ比較なお、オリジナルの記事はこちらのPart1から全て辿ることができる。 http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/ この連載はまだ続くはずだがPart 7がいつ出るのかはわからない。また出た時に翻訳を続けられるかもわからない。なお、訳者はSolrもElasticSearchも大した知識を持っていない。誤訳等見つけられたらぜひコ
tomoemon 2013/05/11
elasticsearch

全文検索
リンク
Exploring AI-Powered Insights | Squirro Blog
Complimentary access to 2024 Gartner® Market Guide for Conversational AI Solutions | Get Access Now!
tomoemon 2013/05/08
全文検索

elasticsearch
リンク
http://sleepyheads.jp/docs/prob_ir.pdf
tomoemon 2013/03/27
全文検索

文字列処理
リンク
Solr vs Elasticsearch: Performance Differences & More - Sematext
“Solr or Elasticsearch?”…well, at least that is the common question we hear from Sematext’s consulting services clients and prospects. Which one is better, Solr or Elasticsearch? Which one is faster? Which one scales better? Which one is easier to manage? Which one should we use? Is there any advantage to migrating from Solr to Elasticsearch? – and the list goes on. These are all great questions,
tomoemon 2013/03/26
solr

lucene

全文検索
リンク
GitHub - hatena/solr-tutorial: Solrの導入資料です。LAMP構成に特化しています。
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
tomoemon 2012/05/03
全文検索

はてな
リンク
索引語辞書におけるキャッシュの採用
実験結果を見ると，キャッシュヒット率が 90% を超えるくらいに調整したとき，キャッシュに含まれる索引語の割合は全体の 1% にも満たないことが分かります．つまり，空間効率の低いデータ構造をキャッシュとして採用したところで，索引語辞書のサイズにはほとんど影響しません．一方で，時間効率の高いデータ構造を採用すれば，索引構築にかかる時間を大幅に短縮できます．たとえば，キャッシュヒット率を 90% に調整すると，キャッシュが索引語 1 つあたりに必要とするサイズが本体のそれと比べて 5 倍でも，全体の 5% にも満たないということです．また，キャッシュの参照時間が本体の 1/5 になると仮定すれば，キャッシュミスしたときはキャッシュと本体の両方を参照することになるものの，1/5 x 90% + 6/5 * 10% = 30% にまで平均参照時間を短縮できることになります．索引語辞書の構成はど
tomoemon 2011/08/02
全文検索
リンク
第5回Solr勉強会に参加しました #SolrJp - nokunoの日記
というわけで、途中からですが第5回Solr勉強会に参加しました。会場は#TokyoNLPと同じECナビさん。Solrは最近勢いのあるオープンソースの全文検索エンジンで、mecabを使ったりして日本語の検索にも対応しているようです。第5回Solr勉強会 : ATNDWelcome to Solr Tokenizer いろいろ比較 by @haruyamaさん（途中から） Igoという形態素解析器がある Ngram系 vs 形態素解析：Ngram系のほうが速いバージョン3.1.0 vs 1.4.1: ほとんど差はなしモテるSolr系女子力の（ｒｙ本日をもってECナビを退社→転職先募集中！図書館でのSolrの使い方 by @nabetaさん(田辺浩介) Project Next-LのはなしProject Next-L Official Page Next-L Enju: オープンソース
tomoemon 2011/05/19
lucene

solr

全文検索
リンク
第5回Solr勉強会
HARUYAMA Seigo @haruyama 春山征吾のくけー : モテるSolr女子力を磨くための4つの心得 #SolrJP - livedoor Blog（ブログ） - http://icio.us/PuJUAa でっちあげた
tomoemon 2011/05/19
togetter

solr

lucene

全文検索
リンク
Apache Nutch™
Nutch is a highly extensible, highly scala ble, matured, production-ready Web crawler which enables fine grained configuration and accomodates a wide variety of data acquisition tasks. Scala ble Relying on Apache Hadoop™ data structures, Nutch is great for batch processing large data volumes but can also be tailored to smaller jobs. Pluggable Out of the box Nutch offer powerful plugins i.e., parsing
tomoemon 2011/04/20
lucene

全文検索
リンク
Transactions on InnoDB » Blog Archive » Only God can make random selections
tomoemon 2011/04/11
handlersocketでやろうとしてたことがmysqlに組み込まれたという印象

mysql
リンク
NGramTokenizerとEdgeNGramTokenFilter | 関口宏司のLuceneブログ
一定期間更新がないため広告を表示しています
tomoemon 2011/04/05
lucene

全文検索
リンク
Google App EngineでLuceneを使ってN-gram全文検索を行ってみる - そこはかとなく書くよ。
全文検索エンジンLuceneをGoogle App Engine/Javaのslim3の上で動かしてみました。indexの作成には、N-gram を使っています。準備まずは、Luceneの最新版を取得します。今回は3.0.2を使用しました。 lib/lucene-core contrib/contrib/analyzers/common/lucene-analyzers-3.0.2 の二つのjarファイルをprojectの war/WEB-INF/lib にコピーし、build pathに追加します。 GAE特有の問題に対処 Luceneを使うだけであればjarをいれておけばよいのですが、GAE特有の問題がいくつかあります。 Indexの取り扱い Luceneはindexを保持し、このindexを元に文書を検索します。そのため、このindexをどこにどうやって保存するかが問題となります
tomoemon 2011/03/28
全文検索

google
リンク
Charming Python: Functional programming in Python, Part 3
IBM Developer is your one-stop location for getting hands-on training and learning in-demand skills on relevant techno logies such as generative AI, data science, AI, and open source.
tomoemon 2010/11/22
全文検索
リンク
IBM Developer
IBM Developer is your one-stop location for getting hands-on training and learning in-demand skills on relevant techno logies such as generative AI, data science, AI, and open source.
tomoemon 2010/11/22
全文検索
リンク
検索エンジンの選定と評価項目
全文検索システムの評価項目：精度全文検索システムの比較には、さまざまな評価項目があります。ここではまず、その評価項目について解説していきます。検索の精度は検索システムにとって最も重要な評価項目です。精度が低い検索システムでは、目的の文書を的確に見つけることができません。検索システムの精度としては、適合率と再現率という2つの数値がよく使われます。適合率（Precision）とは、検索でヒットした文書のうち、正しく検索条件に当てはまる文書の割合です。この数値が1（100%）に近いほど、検索ノイズが少ない検索システムであるといえます。検索ノイズとは、検索条件に当てはまらないのに検索結果となってしまっている文書のことです。再現率（Recall）とは、検索条件に当てはまるすべての文書のうち、検索でヒットした文書の割合です。この数値が1（100%）に近いほど、検索漏れが少ない検索システムである
tomoemon 2010/11/10
全文検索
リンク
1 2 次のページ