[B! Research][情報工学] Kshi_Kshiのブックマーク

Kshi_Kshi id:Kshi_Kshi

Researchと情報工学に関するKshi_Kshiのブックマーク (4)

情報検索特論
Kshi_Kshi 2012/03/02
CS

research

情報工学

slide

講義
リンク
第6回 N-gramと形態素解析との比較 | gihyo.jp
これまでに、N-gramと形態素解析の2つの検索エンジンの、見出し語の切り出し方法を説明しました。今回は、2つの見出し語の切り出し方法を比較し、それぞれの得意な点、不得意な点を明らかにしていきます。 2つの手法の概要はじめに、2つの手法をおさらいしてみます。形態素解析検索対象のテキストを形態素解析を行い分かち書きを行う分かち書きした単位を見出し語として転置インデックスを作成する転置インデックスを元に検索を行う N-gram 検索対象のテキストをN文字単位の文字列片に分解する分解した文字列片を見出し語として転置インデックスを作成する検索語をN文字単位の文字列片に分け検索を行う文字列の出現位置情報を利用すれば、漏れのない完全一致の検索が可能大きな違いは、「⁠転置インデックスの見出し語をどのように作るか」というプロセスが異なる点です。形態素解析は構文解析を行って分かち書きを行う
Kshi_Kshi 2012/03/02
N-grams 形態素解析

CS

research

情報工学
リンク
Dice-Sørensen coefficient - Wikipedia
The Dice-Sørensen coefficient (see below for other names) is a statistic used to gauge the similarity of two samples. It was independently developed by the botanists Lee Raymond Dice[1] and Thorvald Sørensen,[2] who published in 1945 and 1948 respectively. Name[edit] The index is known by several other names, especially Sørensen–Dice index,[3] Sørensen index and Dice's coefficient. Other variation
Kshi_Kshi 2012/03/02
Jaccard

CS

research

情報工学
リンク
Jaccard index - Wikipedia
Each attribute must fall into one of these four categories, meaning that The Jaccard similarity coefficient, J, is given as The Jaccard distance, dJ, is given as Statistical inference can be made based on the Jaccard similarity coefficients, and consequently related metrics.[6] Given two sample sets A and B with n attributes, a statistical test can be conducted to see if an overlap is statisticall
Kshi_Kshi 2012/03/02
Jaccard

CS

research

情報工学
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx