[B! lucene] dannのブックマーク

Scaling Lucene and Solr

Fusion Platform Overview Explore the robust offerings of the world’s most open search and discovery software platform Fusion Platform Pricing Lucidworks pricing is the industry benchmark for ease and value

dann 2011/12/07

リンク

Lucene and fadvise/madvise

While indexing, Lucene periodically merges multiple segments in the index into a single larger segment. This keeps the number of segments relatively contained (important for search performance), and also reclaims disk space for any deleted docs on those segments. However, it has a well known probl em: the merging process evicts pages from the OS's buffer cache. The eviction is ~2X the size of the m

dann 2011/06/20

lucene

リンク

Home

Lucidworks Platform Overview Explore the robust offerings of the world’s most open search and discovery software platform Lucidworks Platform Pricing Lucidworks pricing is the industry benchmark for ease and value

dann 2011/06/10

lucene

リンク

Greplin at Lucene Revolution 2011

Greplin explaining their search architecture in a Lightning Talk at Lucene Revolution 2011.

dann 2011/05/30

lucene
solr

リンク

『アメーバで利用しているLuceneのMergePolicyについて』

こんにちは。アメーバで検索を担当しているYASUDAです。今日は、アメーバで利用しているオープンソースの検索エンジンであるLucene/Solr の新機能の一つをご紹介します。いつもLucene/Solrの恩恵を受けているので、少しでも普及に繋がると嬉しいです。紹介するのはLucene 3.2以降のバージョンで利用可能となるTieredMergePolicyです。以下に、LuceneにおけるインデックスのSegment構成、MergePolicyの概要、TieredMergePolicyの特徴とアメーバの対応について記述します。 ■ Luceneにおける転置インデックスのSegment構成 Luceneの転置インデックスは、各々が独立したSegmentという単位で構成されており、ドキュメントの追加分をflushする際、新しい世代番号を付けたSegmentを生成していきます（図1）。

dann 2011/05/18

lucene
java

リンク

Lucene Utilities and Bloom Filters - Greplin:tech

As you may rem ember, at Greplin we have built some of our search features on top of the excellent Lucene project. As avid users, we've built a fair number of tools that help us use Lucene to the fullest. Today we're happy to announce that we'll be open sourcing a few more of them in the greplin-lucene-utils GitHub project. Some noteworthy features include: A class that construct BooleanQueries in

dann 2011/04/15

リンク

Using Finite State Transducers in Lucene

FSTs are finite-state machines that map a term (byte sequence) to an arbitrary output. They also look cool: That FST maps the sorted words mop, moth, pop, star, stop and top to their ordinal number (0, 1, 2, ...). As you traverse the arcs, you sum up the outputs, so stop hits 3 on the s and 1 on the o, so its output ordinal is 4. The outputs can be arbitrary numbers or byte sequences, or combinati

dann 2011/03/26

lucene

リンク

Lucene's FuzzyQuery is 100 times faster in 4.0

There are many exciting improvements in Lucene's eventual 4.0 (trunk) release, but the awesome speedup to FuzzyQuery really stands out, not only from its incredible gains but also because of the amazing behind-the-scenes story of how it all came to be. FuzzyQuery matches terms "close" to a specified base term: you specify an allowed maximum edit distance, and any terms within that edit distance fr

dann 2011/03/25

lucene

リンク

Visualizing Lucene's segment merges

If you've ever wondered how Lucene picks segments to merge during indexing, it looks something like this: That video displays segment merges while indexing the entire Wikipedia (English) export (29 GB plain text), played back at ~8X real-time. Each segment is a bar, whose height is the size (in MB) of the segment (log-scale). Segments on the left are largest; as new segments are flushed, they appe

dann 2011/03/09

lucene

リンク

RoughOverviewOfLuceneSearchProcess

Rough overview of Lucene 2.9 search process

dann 2010/11/06

lucene
solr

リンク

Lucene Revolution 2010 Presentations | Lucid Imagination

Fusion Platform Overview Explore the robust offerings of the world’s most open search and discovery software platform Fusion Platform Pricing Lucidworks pricing is the industry benchmark for ease and value

dann 2010/10/30

リンク

Luceneを用いた全文検索 - nodchipの日記

自分が開発・運営しているQMACloneでは全文検索エンジンにtritonn-MySQLを使用している。tritonnを使用する場合、yumやaptでインストールすることができるMySQLパッケージを使うことが実質できなくなってしまう(正確には共存できるはずだが面倒)。このためメンテナンスや他のパッケージとの競合の解消が面倒になってしまう。最近Twitterが全文検索エンジンにLuceneを採用したと聞き、自分も試してみることにした。 QMACloneにLuceneを組み込むにあたり、問題データ自体はMySQLに持たせ、ゲームサーバーの起動時に問題データをLuceneでインデックスに変換するという形にした。これは既存のソースコードの兼ね合いからである。書いたコードを備忘録を兼ねて掲載する。インデックス化まずは問題データをDocument化してIndexWriteでインデクスデータに変

dann 2010/10/12

lucene

リンク

Twitter、リアルタイム検索をLuceneで構築。50倍高速に！

Twitterのリアルタイム検索機能が、オープンソースのLuceneベースになったと、Twitter Engineeringブログへのエントリ「Twitter's New Search Architecture」で紹介されています。これまでTwitterはリアルタイム検索にMySQLベースの独自システムを利用してきましたが、規模の拡大が難しくなってきたため、6カ月前に新システムの構築を決定。オープンソースの検索エンジンであるLuceneを選択したとのことです。従来の50倍も高速に！検索エンジンに対する要件は以下のように非常に厳しいものでした。 Our demands on the new system are immense: With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion

dann 2010/10/09

lucene

リンク

Lucene: Asynchronous Index Writer for faster writing

A blog about anything related to Search Engine - Jakarta Lucene I was trying to do some Index writing speed improvement and thought of creating a asynchronous Lucene index writer. This writer provides a addDocument() method which can be called asynchronously by multiple threads. Here are the few scenario where you can utilize this implementation - Reading the data is slower then writing to the ind

dann 2010/08/21

リンク

http://svn.compass-project.org/svn/compass/branches/1_2/src/main/src/org/compass/gps/device/support/parallel/ConcurrentParallelIndexExecutor.java

dann 2010/08/21

リンク

http://www.cnlp.org/presentations/slides/advancedluceneeu.pdf

dann 2010/08/21

リンク

Log In - Apache Software Foundation

Powered by a free Atlassian Confluence Open Source Project License granted to Apache Software Foundation. Evaluate Confluence today. Čeština Dansk Deutsch Eesti English (UK) English (US) Español Français Íslenska Italiano Magyar Nederlands Norsk Polski Português Română Slovenčina Suomi Svenska Русский 中文日本語 한국어 Powered by Atlassian Confluence 7.19.30 Printed by Atlassian Confluence 7.19.30 Report

dann 2010/08/21

リンク