[B! lda] yanbeのブックマーク

yanbe id:yanbe

ldaに関するyanbeのブックマーク (3)

Parallelizing LDA Using Hadoop Map-Reduce
PARALLELIZING LDA USING HADOOP MAP-REDUCE (CSCI 596 Project) William Chang Abijit Bej Objective: Parallelize Latent Dirichlet Allocation using Hadoop Map/Reduce framework. Project Abstract: Latent Dirichlet Allocation (LDA) is a generative model for modeling text. The LDA model hypothesizes that documents contain topics with a certain probability, and these topics contain words with a certain pro
yanbe 2011/08/19
結論から言うと上手くいかないらしいけど、上手くいかない理由も分析されててこれはこれで重要な成果

lda

hadoop

machine-learning
リンク
PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing
PLDA+: Parallel Latent Dirichlet Allocation with Data Placement and Pipeline Processing ZHIYUAN LIU, YUZHOU ZHANG, and EDWARD Y. CHANG, Google Inc. MAOSONG SUN, Tsinghua University Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and p
yanbe 2011/08/13
lda
リンク
Latent Dirichlet Allocations の Python 実装 - 木曜不足
LDA とは "Latent Dirichlet Allocation"。文書中の単語の「トピック」を確率的に求める言語モデル。「潜在的ディリクレ配分法」と訳されていることもあるが、その名前だと「それってなんだっけ？」という人のほうが多そうｗ。各単語が「隠れトピック」(話題、カテゴリー)から生成されている、と想定して、そのトピックを文書集合から教師無しで推定することができる。特徴は、果物の apple と音楽の apple とコンピュータ関連の apple を区別することが出来る(ことが期待される)という点。そのために、どのトピックを生成しやすいかという分布を各文章も持つ。細かい話は略。結果の見方としては、定量的にはパープレキシティを見るし(一般に小さいほどいい)、定性的には各トピックがどのような単語を生成するか、その確率上位のものを見てふむふむする。この「各トピックが生成する単語」
yanbe 2011/08/05
短い！

lda
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx