[B! algorithm] [6ページ] manboubirdのブックマーク

manboubird id:manboubird

algorithmに関するmanboubirdのブックマーク (202)

Blog | Cloudera
ClouderaNOW Learn about the latest innovations in data, analytics, and AI | July 16 Register now
manboubird 2013/04/29
dataSience

algorithm
リンク
http://kdd2012.sigkdd.org/sites/images/summerschool/Ravi-Kumar.pdf
manboubird 2013/04/21
countSketch

kdd

algorithm

mapreduce

streamComputing

slide

google
リンク
http://www.youtube.com/watch?v=KvcBAECDaAA
manboubird 2013/04/18
bigData

video

analytics

conference

linkedIn

algorithm
リンク
Count-Min Sketch のライブラリを公開しました
2012-02-17 Count-Min Sketch のライブラリを公開しました written by Susumu Yata. はじめに先日 groonga プロジェクトでの利用を目的として開発しているライブラリ Madoka を公開しました．Madoka は Count-Min Sketch という手法をライブラリ化したものであり，文書集合に含まれるキーワードの頻度を求める，クエリの頻度を求める，などの用途に使うことができます． s-yata/madoka - GitHub Documentation - Madoka ライブラリの使い方についてはドキュメントに書いてあるので，こちらは Count-Min Sketch と Madoka の特徴をまとめた内容になっています． Count-Min Sketch 頻度を求めることが目的であれば，ハッシュ表による連想配列を使うのが，おそら
manboubird 2013/03/19
countSketch

algorithm

groonga

lib
リンク
Google News: The Secret Sauce
manboubird 2013/02/25
ranking

googleNews

algorithm

patent

google
リンク
GitHub - MLnick/hive-udf: Approximate cardinality estimation with HyperLogLog, as a Hive function
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
manboubird 2013/02/07
hive

hyperLogLog

algorithm

udf
リンク
Probabilistic Data Structures for Web Analytics and Data Mining
Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and poor appl
manboubird 2013/02/07
hyperLogLog

algorithm

analytics

countSketch
リンク
http://research.google.com/pubs/archive/40671.pdf
manboubird 2013/02/07
paper

hyperLogLog

algorithm

google
リンク
HyperLogLog++: Google’s Take On Engineering HLL –
Matt Abrams recently pointed me to Google’s excellent paper “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm” [UPDATE: changed the link to the paper version without typos] and I thought I’d share my take on it and explain a few points that I had trouble getting through the first time. The paper offers a few interesting improvements that are w
manboubird 2013/02/07
hyperLogLog

algorithm

google

paper
リンク
Set Operations On HLLs of Different Sizes –
Introduction Here at AK, we’re in the business of storing huge amounts of information in the form of 64 bit keys. As shown in other blog posts and in the HLL post by Matt, one efficient way of getting an estimate of the size of the set of these keys is by using the HyperLogLog (HLL) algorithm. There are two important decisions one has to make when implementing this algorithm. The first is how ma
manboubird 2013/02/07
hyperLogLog

algorithm
リンク
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure –
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure Intro In the Zipfian world of AK, the HyperLogLog distinct value (DV) sketch reigns supreme. This DV sketch is the workhorse behind the majority of our DV counters (and we’re not alone) and enables us to have a real time, in memory data store with incredibly high throughput. HLL was conceived of by Flajolet et. al. in the ph
manboubird 2013/02/07
hyperLogLog

algorithm
リンク
https://algo.inria.fr/flajolet/Publications/DuFl03.pdf
manboubird 2013/02/07
paper

hyperLogLog

algorithm
リンク
HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (PDF形式)
manboubird 2013/02/07
paper

hyperLogLog

algorithm
リンク
Workshop on Algorithms for Modern Massive Data Sets (MMDS 2012)
MMDS. Workshop on Algorithms for Modern Massive Data Sets Website: We have moved to a new website; this page is no longer maintained. To visit the MMDS, go to mmds-data.org. Registration for MMDS 2014 is now open! Synopsis The Workshops on Algorithms for Modern Massive Data Sets (MMDS) addresses algorithmic and statistical challenges in modern large-scale data analysis. The goals of this series of
manboubird 2012/08/07
conference

modernMassiveDataSets

algorithm

video

dataSience
リンク
The MMDS 2012 Slides are out! Workshop on Algorithms for Modern Massive Data Sets
In case you have to take your mind off tomorrow's suspense-filled and techno logically challenging landing of Curiosity on Mars (see 7 minutes of Terror, a blockbuster taking place on Mars this Summer ) Michael Mahoney, Alex Shkolnik, Gunnar Carlsson, Petros Drineas, the organizers of Workshop on Algorithms for Modern Massive Data Sets (MMDS 2012), just made available the slides of the meeting. Oth
manboubird 2012/08/07
slides

conference

modernMassiveDataSets

algorithm

bigData

datamining
リンク
http://homepage.informatik.fh-gelsenkirchen.de/conen/vorlesungen/ss04/gin2-ss04/fast-implementations-of-shortest-path.pdf
manboubird 2012/07/04
graph

paper

algorithm

dijkstra
リンク
Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory - High Scalability -
The table shows that we can count the words with a 3% error rate using only 512 bytes of space. Compare that to a perfect count using a HashMap that requires nearly 10 megabytes of space and you can easily see why cardinality estimators are useful. In applications where accuracy is not paramount, which is true for most web scale and network counting scenarios, using a probabilistic count
manboubird 2012/04/28
aggregation

implementation

cardinalityEstimation

hyperLogLog

algorithm
リンク
More data usually beats better algorithms, Part 2
Datawocky On Teasing Patterns from Data, with Applications to Search, Social Media, and Advertising The post More Data Beats Better Algorithms generated a lot of interest and comments. Since there are too many comments to address individually, I'm addressing some of them in this post. 1. Why should we have to choose between data and algorithms? Why not have more data and better algorithms? A. Ther
manboubird 2012/03/11
MoreDataBeatsBetterAlgorithm

limitation

comparison

bigData

algorithm
リンク
MurmurHash, final version.
UPDATE - If you're reading this via a link from Google or Reddit, please go here - http://murmurhash.googlepages.com. All future updates about MurmurHash will be posted there. UPDATEUPDATE - MurmurHash is now at version 2.0. The new version uses a different mix function than the below that is much faster & mixes better. Code is on the website linked above. OK, I'm done with this for the time being
manboubird 2012/03/08
murmurHash

hash

algorithm
リンク
Algorithms for calculating variance - Wikipedia
Algorithms for calculating variance play a major role in computational statistics. A key difficulty in the design of good algorithms for this probl em is that formulas for the variance may involve sums of squares, which can lead to numerical instability as well as to arithmetic overflow when dealing with large values. A formula for calculating the variance of an entire population of size N is: Usin
manboubird 2011/12/22
algorithm

math

variance

hadoop
リンク
前のページ 2 3 4 5 6 7 8 9 10 11 次のページ

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx