タイトル「HyperLogLog」を検索 - はてなブックマーク

1 - 32 件 / 32件

新着順人気順

絞り込み

検索対象
ブックマーク数
期間
セーフサーチ

HyperLogLogの検索結果1 - 32 件 / 32件

乱択データ構造の最新事情－MinHash と HyperLogLog の最近の進歩－
- 83 users
- www.slideshare.net/slideshow
- テクノロジー
- 2014/05/29
MinHash, b-bit MinHash, HyperLogLog, Odd Sketch, HIP Estimator の解説です．Read less
RedisのHyperLogLogを使ってユニークユーザー数を推定する – Rest Term
- 45 users
- rest-term.com
- テクノロジー
- 2015/01/11
去年の内に公開することが出来ず、ずっと下書き状態だったエントリーをちょっとずつ消化していきたいと思います。ネタとして古いものも含まれていたりすると思いますがしばらくご辛抱ください。。 Redis 2.8.9から追加された HyperLogLog をちょっと触ってみました。環境 * CentOS 7.0 (x86_64) / Intel Xeon E312xx (Sandy Bridge) 2.4GHz 仮想3コア / 2GB RAM * Redis 2.8.17 * redis-py (Python 2.7.5) HyperLogLogとは HyperLogLog (以下HLL)というアルゴリズムはデータマイニング(トラフィックデータの分析等)とか自然言語処理をやってる人ならともかく、Webアプリケーション開発者にはあまり馴染みがないかもしれません。 HyperLogLog – Wiki
HyperLogLogで遊ぶ - Negative/Positive Thinking
- 41 users
- jetbead.hatenablog.com
- テクノロジー
- 2013/02/23
はじめに「さぁ、お前の罪の異なり数を数えろ！」と言われたときに使えそうな「HyperLogLog」という異なり数をカウントする方法を教えてもらったので、遊んでみた。いつもながら論文ちゃんと読んでないので、条件やコード間違ってるかも。。。 HyperLogLogとは cardinalityと呼ばれる、要素の異なり数を決定する問題かなり省メモリで精度のよい異なり数を推定できる方法要素をそのまま保存せず、ハッシュ値に変換したものをうまくレジスタに保存しておくので、レジスタサイズ程度しかメモリを使わない並列化もできて、最近のbigdataとかで注目されているまた、googleが並列計算用に改善したHyperLogLogを提案してるみたい http://blog.aggregateknowledge.com/2013/01/24/hyperloglog-googles-take-on-
- algorithm
- nlp
- programming
- Google
- あとでみる
- 研究
乱択データ構造の最新事情 −MinHash と HyperLogLog の最近の進歩− - iwiwiの日記
- 36 users
- iwiwi.hatenablog.com
- テクノロジー
- 2014/05/29
本日，PFI セミナーにて「乱択データ構造の最新事情 −MinHash と HyperLogLog の最近の進歩−」というタイトルで話をさせてもらいました．スライドは以下になります． Ustream の録画もあります． http://www.ustream.tv/recorded/48151077 内容としては，以下の操作を効率的に行うための集合に関するデータ構造 (Sketch) の最近の進歩を紹介しました．集合の類似度の推定 (Jaccard 係数) 集合異なり数の推定 (distinct counting) どちらも重要かつ基礎的な操作で，b-bit MinHash や HyperLogLog など，既に実用的な手法が提案されており，実際にも使われています．しかし，2014 年になって，Odd Sketch や HIP Estimator という，これらをさらに改善する手法が立て続
Redis new data structure: the HyperLogLog - <antirez>
- 35 users
- antirez.com
- テクノロジー
- 2014/04/01
Generally speaking, I love randomized algorithms, but there is one I love particularly since even after you understand how it works, it still remains magical from a programmer point of view. It accomplishes something that is almost illogical given how little it asks for in terms of time or space. This algorithm is called HyperLogLog, and today it is introduced as a new data structure for Redis. Co
- redis
- algorithm
- log
- programming
- HyperLogLog
Rustで作ってみよう -- HyperLogLogと並列処理で、ウィキペディア全記事のユニーク単語数を見積もる（その１） - Qiita
- 28 users
- qiita.com/tatsuya6502
- テクノロジー
- 2016/12/02
Rustで作ってみよう -- HyperLogLogと並列処理で、ウィキペディア全記事のユニーク単語数を見積もる（その１）アルゴリズムRust 今年の始め、私が Rust を習いはじめのころ、手本となるプログラムがあまり見つからないことが不満でした。GitHub で探せば、Rust で書かれた実用的なライブラリーが数多く見つかりますが、それらを読むのは入門者にとっては敷居が高過ぎます。私が欲しかったのは、学習用に書かれたプログラムで、入門者が手軽に試せて、いろいろといじれるプログラム例でした。そんなわけで、そういうプログラム例を書いてみようと思います。２回に分けて、Rust で簡単なツールを作ります。今回は乱択アルゴリズムの一種である、probability cardinarity estimatior（確率的カーディナリティ推定機）を実装します。HyperLogLog という名前のデ
PrestoとHyperLogLogで、大量ログからユニークユーザー数を高速に推定する（理論編） - Platinum Data Blog by BrainPad
- 25 users
- blog.brainpad.co.jp
- テクノロジー
- 2016/06/27
本記事は、当社オウンドメディア「Doors」に移転しました。約5秒後に自動的にリダイレクトします。皆さん、こんにちは。マーケティングプラットフォーム本部で広告系製品の開発を担当している田頭です。現在、ブレインパッドでは、DMP（※1）に蓄積されたさまざまなデータをもとに、ユーザー数を確認しながら直感的にセグメントを作成できる「DeltaCube（デルタキューブ）」という製品を提供しています。ユーザーのセグメントを作成する際、担当者がストレスなくインタラクティブにセグメントの条件を設定しユーザー数を確認するためには、蓄積された大規模なデータの中から「高速に」ユニークユーザーの数を数えなければなりません。このため、DeltaCubeでは、集計処理にPrestoと呼ばれる分散処理ミドルウェアを利用しています。今回、このPrestoを利用したユニークユーザー集計の更なる高速化について検
『Spark StreamingでHyperLogLogを実装してみた』
- 16 users
- ameblo.jp/principia-ca
- テクノロジー
- 2014/12/18
この記事は、CyberAgent エンジニア Advent Calendar 2014 の18日目の記事です。本日12/18は@sitotkfmが担当いたします。昨日は@yuichiro.nakazawa1さんの「ログ解析にNorikraを使ってみた」でした。明日は@strskさんです。さて、唐突な上に私事で大変恐縮ですが最近秋葉原オフィスの近くに引っ越しました。場所は秋葉原の少し西なので通勤途中に両国を通過します。相撲といえば今年大記録が生まれました。「巨人、大鵬、卵焼き」の中央に座する歴史的横綱大鵬と並ぶ32度目の優勝！そんな大横綱白鵬のブログが読めるのはアメブロだけ！（PCでみると、、、すごく、、、黒いです、、、）さてCyberAgentのAdvent Calendarとしての義理を果たしたところで本題です。私は秋葉原ラボに勤務しており、主にログの集積・解析シ
- spark
- algorithm
- it
HyperLoglogでcount distinctを速くする | DACエンジニアブログ：アドテクゑびす界
- 9 users
- yebisupress.dac.co.jp
- テクノロジー
- 2016/03/17
こんにちは。俺やで。 HyperLoglogについて書きます。おもしろいです。名前が。 ■1. HyperLoglogとは？ count distinctを速くするアルゴリズム以前、Minhashについて書きました。（Treasure Dataさんのブログにも載せていただきました。ありがとうございます。） HivemallでMinhash！〜似てる記事を探し出そう。〜 Build a Simple Recommendation Engine with Hivemall and Minhash HyperLoglogもMinhash同様乱択アルゴリズムを応用したものです！ビッグデータのエンジニアとかデータアナリストであれば、count distinctする機会はめちゃめちゃあると思うのですが、「おせーよ。早く結果返せよ」と思うこともめちゃめちゃあるのでは。なぜ遅いかと言うと正直にすべ
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure –
- 9 users
- research.neustar.biz
- テクノロジー
- 2013/02/07
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure Intro In the Zipfian world of AK, the HyperLogLog distinct value (DV) sketch reigns supreme. This DV sketch is the workhorse behind the majority of our DV counters (and we’re not alone) and enables us to have a real time, in memory data store with incredibly high throughput. HLL was conceived of by Flajolet et. al. in the ph
- hyperLogLog
- algorithm
- algorithms
- datamining
- data
HyperLogLog++: Google’s Take On Engineering HLL –
- 9 users
- research.neustar.biz
- テクノロジー
- 2013/01/25
Matt Abrams recently pointed me to Google’s excellent paper “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm” [UPDATE: changed the link to the paper version without typos] and I thought I’d share my take on it and explain a few points that I had trouble getting through the first time. The paper offers a few interesting improvements that are w
- algorithm
- HyperLogLog
- google
- paper
- *あとで
HyperLogLog - Wikipedia
- 8 users
- en.wikipedia.org
- テクノロジー
- 2016/04/28
HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset.[1] Calculating the exact cardinality of the distinct elements of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly les
- あとで読む
HyperLogLog in Presto: Faster cardinality estimation - Facebook Code
- 8 users
- engineering.fb.com
- テクノロジー
- 2018/12/14
HyperLogLog in Presto: A significantly faster way to handle cardinality estimation WindForce - Turbulence X Scale, by Alvaro Moreira and Particle Skull: https://vimeo.com/94061802, CC-BY-3.0. Computing the count of distinct elements in massive data sets is often necessary but computationally intensive. Say you need to determine the number of distinct people visiting Facebook in the past week using
- presto
- data
- facebook
HyperLogLog is interesting
- 7 users
- speakerdeck.com/ocadaruma
- テクノロジー
- 2019/08/31
builderscon tokyo 2019の発表資料 https://builderscon.io/tokyo/2019
- あとで読む
GitHub - axiomhq/hyperloglog: HyperLogLog with lots of sugar (Sparse, LogLog-Beta bias correction and TailCut space reduction) brought to you by Axiom
- 7 users
- github.com/axiomhq
- テクノロジー
- 2017/06/27
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
HyperLogLogについて
- 7 users
- zenn.dev/masatana
- テクノロジー
- 2022/04/22
このとき、最初(左)から数えて、 o の連続して出た数を数え、その最大値に注目します。上記の例では、 Aさんが最も多く、2回ですね。さて、コイントスの表裏が出る確率がそれぞれ 50% (つまり、完全にランダム)であったと仮定した場合、連続して2回表が出る確率は以下ですよね。 (1/2)^2 = 1/4 そうすると、どれだけコイントスを試行したか（≒どれだけの人数がいたか）考えたときに、最も確率が高いのは4回です。つまり総人数は4人！こう数えることには、大きな利点があります。それは、ある人が既にカウントされたかどうかを記録しておく必要がないということです。なぜなら、 o が続いた最大数だけを覚えておけばいいので。これは、特に大きな人数集団であった場合には特に嬉しいですよね。いやいや、大雑把すぎるだろとか、色々聞こえてきそうです。それは次のセクション以降で説明していきます。ただし
- あとで読む
GitHub - aaw/hyperloglog-redis: An implementation of the HyperLogLog algorithm backed by Redis
- 6 users
- github.com/aaw
- テクノロジー
- 2013/01/15
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
- algorithms
- ruby
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure –
- 6 users
- research.neustar.biz
- テクノロジー
- 2014/05/16
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure Intro In the Zipfian world of AK, the HyperLogLog distinct value (DV) sketch reigns supreme. This DV sketch is the workhorse behind the majority of our DV counters (and we’re not alone) and enables us to have a real time, in memory data store with incredibly high throughput. HLL was conceived of by Flajolet et. al. in the ph
HyperLogLog and MinHash - NextRoll
- 5 users
- tech.nextroll.com
- テクノロジー
- 2014/02/16
At AdRoll, we have a lot of data to deal with. As we keep accumulating all of this data, our scaling issues become more complicated, and even something as simple as counting becomes a bit of a chore. After using Bloom filters to count uniques, we eventually wanted to find something more space efficient. We started researching, and implemented a form of HyperLogLog, which gives us the ability count
RedisのBitCountとHyperLogLogを使用した超高速Unique User数集計
- 5 users
- www.slideshare.net/tmatsuura
- テクノロジー
- 2016/11/30
Redis の BitCount 機能と HyperLogLog を利用してユニークユーザ数（ Unique User Cardinality ）をミリ秒単位でカウントする。 BitCount での計測には Linear Counting のアルゴリズムを用い、HyperLogLog については、MinHash の派生である Base-2 Ranks の考え方も説明する。重複率の計算方法や AND OR の考え方、また、BigData に対してどのようなサーバ構成で処理するべきかの案も提示する。
- Redis
HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (PDF形式)
- 5 users
- algo.inria.fr
- テクノロジー
- 2012/04/27
- web
HyperLogLogを用いた、異なり数に基づく  省リソースなk-meansの k決定アルゴリズムの提案
- 5 users
- www.slideshare.net/slideshow
- テクノロジー
- 2015/03/08
Kai Sasaki of Treasure Data presented on distributed data analysis challenges and their approaches. Some challenges of distributed analysis are high operational costs and ensuring stability and scalability. Treasure Data uses AWS services like CodeDeploy and Auto Scaling to automate deployments and scaling. They perform query simulations for load testing and scale infrastructure based on metrics t
- programming
GitHub - ekzhu/datasketch: MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
- 4 users
- github.com/ekzhu
- テクノロジー
- 2018/11/21
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
GitHub - axiomhq/hyperminhash: HyperMinHash: Bringing intersections to HyperLogLog
- 4 users
- github.com/axiomhq
- テクノロジー
- 2018/01/09
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
Screen6 - HyperLogLog with Cascalog
- 3 users
- screen6.github.io
- 世の中
- 2013/11/19
We’ll look briefly in how you would utilize awesomeness of both Cascalog and HyperLogLog in order to execute Hadoop M/R tasks with amounts of data too big to have them in their original form. Intro HyperLogLog Cardinality estimator allowing you to count amount of distinct values. Cascalog The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local com
HyperLogLog sketchは面白い - builderscon tokyo 2019
- 3 users
- builderscon.io
- 暮らし
- 2019/06/10
Abstract HyperLogLog (以下HLL)は、与えられたデータセットのdistinct element countを省メモリで高速に推定するアルゴリズムです。 RedisやPrestoなど様々なミドルウェアにHLLを利用するための機能が組み込みで用意されていますし、現在のビッグデータ解析において無くてはならないものとなっています。 HLLではデータセットをsketchとよばれる省メモリな表現に変換したのち、そこに保存された値を集計して推定値を算出しますが、このsketchには興味深い特徴があります。まず、異なるデータセットに対して作られたそれぞれのsketchをmergeして、unionのdistinct countを推定することができます。そして、HLL sketchの構築と推定値の算出は独立したステップであるため、推定に影響を与えることなくsketchのバイナリ表現を
HyperLogLog sketch in practice -異なり数の集計の悩みはほぼ解決！- | フライウィール
- 3 users
- www.flywheel.jp
- テクノロジー
- 2019/12/20
ブログ HyperLogLog sketch in practice　-異なり数の集計の悩みはほぼ解決！- データサイエンティストの川端です。10月よりフライウィールにジョインして、データ基盤開発やデータ分析・可視化、広告配信ロジックの開発などを主にやっています。 FLYWHEEL Advent Calendar 2019 の16日目は、その中で取り組んでいるデータ分析・可視化のためのデータ基盤で用いたHyperLogLogの活用について紹介できればと思います。すでにHyperLogLogについてや、その近似精度、計算コストの実験については過去に多くのブログが書かれているので、本稿ではHyperLogLogのコアであるsketchの便利さを中心にどれだけ実戦で使えるかに主眼を置いて紹介していきます。 Count-distinct problem（異なり数集計の問題）WebサイトのPV(pa
PrestoとHyperLogLogで、大量ログからユニークユーザー数を高速に推定する（実践編） - Platinum Data Blog by BrainPad
- 3 users
- blog.brainpad.co.jp
- テクノロジー
- 2017/04/20
本記事は、当社オウンドメディア「Doors」に移転しました。約5秒後に自動的にリダイレクトします。皆さん、こんにちは。マーケティングプラットフォーム本部で広告系製品の開発を担当している田頭です。前回の理論編ブログでは、大量のログから高速にユニークユーザー（以下、UU）数を推定するための背景やアルゴリズム、およびその実装部分について、「理論編」として解説させていただきました。今回は「実践編」として、実際に当社内で行ったapprox_distinct関数のパフォーマンス検証について、以下の内容を皆さんにご紹介したいと思います。検証環境検証データ検証項目検証結果小規模検証結果中規模検証結果大規模検証結果検証結果考察まとめ 1. 検証環境今回の検証は、以下の環境で実施しました。クラスタ環境 Amazon Elastic MapReduce (EMR) を利用（EMR
- あとで読む
Shipped Algorithm::HyperLogLog 0.01! - WebService::Blog->new( user => ’hide_o_55’ )
- 3 users
- hide-o-55.hatenadiary.org
- テクノロジー
- 2013/03/20
HyperLogLogアルゴリズムをPerlから使用するためのモジュール、 Algorithm::HyperLogLogをCPANにリリースしました。 XSモジュールですが、XSが使えない場合はPure Perlでも使えるように作ってあります。 HyperLogLog とは？ HyperLogLogは、ある集合の異なり数(カーディナリティ/cardinality)を少ないメモリ消費で推定するアルゴリズムです。詳しくは以下の論文を読んで下さい。 http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf なお、並列計算用に改善されたHyperLogLog++というアルゴリズムがGoogleによって開発されています。 http://static.googleusercontent.com/external_content/untrust
- Perl
- あとで読む
HyperLogLog++: Google’s Take On Engineering HLL –
- 3 users
- research.neustar.biz
- テクノロジー
- 2014/04/30
Matt Abrams recently pointed me to Google’s excellent paper “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm” [UPDATE: changed the link to the paper version without typos] and I thought I’d share my take on it and explain a few points that I had trouble getting through the first time. The paper offers a few interesting improvements that are w
- google
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm
- 3 users
- research.google
- テクノロジー
- 2013/05/01
Philosophy We strive to create an environment conducive to many different types of research across many different time scales and levels of risk. Learn more about our Philosophy Learn more
- algorithm
- database
GitHub - Alephbet/gimel: Run your own A/B testing backend using AWS Lambda and Redis HyperLogLog
- 3 users
- github.com/Alephbet
- テクノロジー
- 2016/01/23
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert