[B! kmeans] Naruhodiusのブックマーク

K-means clustering is not a free lunch

David Robinson Director of Data Scientist at Heap, works in R. Em ail Twitter Github Stack Overflow Subscribe Recommended R Bloggers RStudio Blog R4Stats Simply Statistics Upfront I recently came across this question on Cross Validated, and I thought it offered a great opportunity to use R and ggplot2 to explore, in depth, the assumptions underlying the k-means algorithm. The question, and my respo

Naruhodius 2015/01/26

kmeans

リンク

Clustering text documents using k-means — scikit-learn 0.14 documentation

This documentation is for scikit-learn version 0.14 — Other versions If you use the software, please consider citing scikit-learn. Clustering text documents using k-means Clustering text documents using k-means¶ This is an example showing how the scikit-learn can be used to cluster documents by topics using a bag-of-words approach. This example uses a scipy.sparse matrix to store the features inst

Naruhodius 2013/11/13

リンク

Mahoutでk-meansしてみる | mwSoft

概要 Mahout IN Actionのコードを頼りに、まずはコマンドとか使わずにJavaのコードからk-meansを実行してみる。その後、bin/mahout kmeansコマンドを使って同じことをやってみる。 Mahout0.7を利用しているので、バージョンによってはクラス名が若干違うこととかあるかも。Klusterとか。クラスタリングするデータ以下のような簡易データを3つにクラスタリングしてみる。 1,1 1,2 2,1 4,4 4,5 5,6 7,9 8,9,8,8 上記データをRでkmeansしてplotすると、下記のようになる。重心の数は3つ。 x = matrix( c(1, 1, 2, 4, 4, 5, 7, 8, 8, 1, 2, 1, 4, 5, 6, 9, 9, 8), ncol=2 ) cl = kmeans(x, 3) plot(x, col=cl$clus

Naruhodius 2012/12/05

リンク

試すのが難しい―機械学習の常識はMahoutで変わる

ビッグデータ時代―なぜ、いま機械学習なのか Apache Hadoop（以下、Hadoop）の登場で、今まで捨てていたデータ、貯めるだけで処理しきれなかったデータを活用できるようになりました。活用手段として最近とみに注目されている技術が「機械学習」であり、Hadoopの強みを生かし簡単に機械学習を行うためのライブラリが、「Apache Mahout」（以下、Mahout）です。本稿ではMahoutを動かしてみることで、機械学習の常識を身に付けます。そもそも、機械学習とは？機械学習とは、一定のデータをコンピュータ・プログラムに「学習」させ（すなわち、そのデータに潜むパターンや規則性を表す「モデル」を自動的に構築させ）、他のデータにそのモデルを適用すれば、あたかも人間のように複雑で柔軟な判断が行えるようにするという試みです。機械学習をビジネスに活用した例は、レコメンド（ユーザーや商品

Naruhodius 2012/11/20

mahout

リンク

k-means++ - Wikipedia

In data mining, k-means++[1][2] is an algorithm for choosing the initial values (or "seeds") for the k-means clustering algorithm. It was proposed in 2007 by David Arthur and Sergei Vassilvitskii, as an approximation algorithm for the NP-hard k-means probl em—a way of avoiding the sometimes poor clusterings found by the standard k-means algorithm. It is similar to the first of three seeding methods

Naruhodius 2012/10/11

kmeans

リンク

クラスター分析の基礎知識

ここでは，私の研究の基礎となるクラスター分析とその一種であるファジィクラスタリングについて簡単に説明します

Naruhodius 2012/10/01

kmeans

リンク

クラスタリングの定番アルゴリズム「K-means法」をビジュアライズしてみた

集合知プログラミングを読んでいたら、K-means 法（K平均法）の説明が出てきました。 K-means 法はクラスタリングを行うための定番のアルゴリズムらしいです。存在は知っていたんだけどいまいちピンときていなかったので、動作を理解するためにサンプルを作ってみました。クリックすると１ステップずつ動かすことができます。クラスタの数や点の数を変更して、Restart を押すと好きなパラメータで試すことができます。こうやって１ステップずつ確認しながら動かしてみると、意外に単純な仕組みなのが実感できました。 (追記) HTML5 版の K-means 法を D3.js でビジュアライズしてみたも作成しました。Flash を表示できない環境ではそちらをご覧ください。 K-means 法とは K平均法 - Wikipedia に詳しく書いてあるけど、もうすこしザックリと書くとこんなイメージに

Naruhodius 2012/10/01

kmeans

リンク

k-means法 - 機械学習の「朱鷺の杜Wiki」

k-means法 (k-means method)† 次の目的関数を最小化する分割最適化クラスタリングの代表的手法． \[\mathrm{Err}(\{X_i\})=\sum_i^k\;\sum_{\mathbf{x}\in X_i}\;{\|\mathbf{x} - \bar{\mathbf{x}}_i\|}^2\] ただし，データ集合 $X$ は，ベクトルで表現されたデータ $\mathbf{x}$ の集合．クラスタ $X_i$ は，データ集合の網羅的で互いに素な部分集合． $\bar{\mathbf{x}}_i$ は $X_i$ 中の重心(セントロイドともいう)． $\|\cdot\|$ はユークリッドノルム． ↑ アルゴリズム† 入力はデータ集合 $X$ とクラスタ数 $k$，および最大反復数 maxIter．初期化：データ集合をランダムに \(

Naruhodius 2012/10/01

kmeans

リンク

Rubyで『集合知プログラミング』(5) - Code Court

第3章「グループを見つけ出す」 3.6 K平均法によるクラスタリング "clusters.rb"の続きです。 K平均法によりクラスタを求めるメソッドです。 def kcluster(rows, k=4, calc_distance=method(:pearson)) # それぞれのポイントの最小値と最大値を求める # == データの各列の最小値と最大値を求める len = rows[0].length ranges = Array.new(len) do |i| r = rows.map{|row| row[i]} {:min=>r.min, :max=>r.max} end # 重心をランダムにk個配置 clusters = Array.new(k) do |j| row = Array.new(len) do |i| rand(ranges[i][:max] - ranges[i][:

Naruhodius 2012/10/01

kmeans

リンク

K平均法によるクラスタの作成 - kj-ki’s blog

今回やること前回の階層型クラスタに続き，K平均法クラスタを作成します．入力データも"blogdata.txt"を引き続き用います． KmeansClusterクラスの作成 K平均法クラスタのソースを示します． module My class KmeansCluster LOOP_MAX = 100 def initialize(word_counts, user_options = {}) @word_counts = word_counts @min_and_max = {} @centroids = {} @cluster = Hash.new { |hash, key| hash[key] = [] } @options = { :centroids => 4 }.merge(user_options) end # 重心をランダム値で初期化してから，相関の近いURLを所属させる #

Naruhodius 2012/10/01

kmeans

リンク

はてなブックマーク

タグ

関連タグで絞り込む (3)

kmeansに関するNaruhodiusのブックマーク (10)

お知らせ

今週のはてなブックマーク数ランキング（2024年9月第3週）

今週のはてなブックマーク数ランキング（2024年9月第2週）

月間はてなブックマーク数ランキング（2024年8月）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス