yuisekiのブックマーク - はてなブックマーク

Google releases dataset linking strings and concepts - UMBC ebiquity
Yesterday Google announced a very interesting resource with 175M short, unique text strings that were used to refer to one of 7.6M Wikipedia articles. This should be very useful for research on information extraction from text. “We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected u
yuiseki 2012/05/21
linkeddata
リンク
Naive Bayes classifier in 50 lines - UMBC ebiquity
The Naive Bayes classifier is one of the most versatile machine learning algorithms that I have seen around during my meager experience as a graduate student, and I wanted to do a toy implementation for fun. At its core, the implementation is reduced to a form of counting, and the entire Python module, including a test harness took only 50 lines of code. I haven’t really evaluated the performance,
yuiseki 2012/01/29
数学

ナイーブベイズ

確率統計

機械学習

python
リンク
Twitter Social Network Analysis - UMBC ebiquity
In the recent series of posts, we have presented Twitter Goolgle Maps mashup, a Twitter search and buzz tracking tool called Twitterment and analysis of geolocation information from the twitter dataset. By providing a neat API, Twitter has enabled researchers to get a better understanding of Microblogging. In this post, I have used the Large Graph Layout (LGL) tool to visualize the social network
yuiseki 2007/04/21
ぱのぷてぃこん
リンク
1