[B! datamining] nfunatoのブックマーク

nfunato id:nfunato

dataminingに関するnfunatoのブックマーク (88)

GitHub - aozorabunko/aozorabunko
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
nfunato 2018/07/15
DataMining
リンク
Amazon.co.jp: 前処理大全[データ分析のためのSQL/R/Python実践テクニック]: 本橋智光: 本
nfunato 2018/03/09
book

DataMining
リンク
Introduction to Algorithmic Marketing: Artificial Intelligence for Marketing Operations: Ilya Katsov: 9780692989043: Amazon.com: Books
Return this it em for free Free returns are available for the shipping address you chose. You can return the it em for any reason in new and unused condition: no return shipping charges. Learn more about free returns.
nfunato 2018/02/14
marketing

book

DataMining
リンク
MLのアルゴリズム以外を取り扱った勉強会を開催した話 - 元データ分析の会社で働いていた人の四方山話
年末までにこのエントリを投下したい、と思いつつ、あれよあれよと年があけてしまいましたが、去る2017年12月に表題の通り、機械学習のアルゴリズム「以外」を対象としたML Ops Study（仮）#1　という勉強会を開催してみました。経緯などここ数年、機械学習やらディープラーニング、と言われる領域に親しいところに身を置いていて、自分の興味が機械学習や高度な分析の社会実装に興味があることが分かってきました。幸いにして、機械学習ブームによって、機械学習のアルゴリズム部分に関する勉強会や書籍はたくさん世の中に出てくるようになり、結果、多くの人が機械学習のアルゴリズムを勉強をするようになったように思います。一方で、問題を解決できそうなアルゴリズムがあったとしても、実際はそれを仕組みに落としていくところ、仕組みにした後に継続的に運用し続ける部分に関しての知見やノウハウはなかなかないのが現状です。こ
nfunato 2018/01/04
machinelearning

DataMining

slideshare
リンク
Pandasによる実践データ分析入門 - Gunosyデータ分析ブログ
こんにちは。データ分析部のオギワラです。最近は「NANIMONO (feat.米津玄師)」をよく聞いています。今回はPythonのデータ分析ライブラリであるPandasについて、実践的なテクニックを「データ処理」「データ集計(Group By)」「時系列処理」の３カテゴリに分けてご紹介していきます。 Pandasに関する基本的な内容については、前エントリーで既に紹介されているので、是非こちらもご一読して頂けると幸いです。 data.gunosy.io データ処理データの取り出し(query) 条件文に基づくデータ処理の適用(where) 各行への関数の適用(apply) データ集計(Group By) カラム毎に異なる集計を適用する(agg) 最大・最小値である行を取り出す(first) 標準化や正規化処理を適用する(transf orm) 時系列処理時間の丸め処理(round) 時系
nfunato 2017/05/12
pandas

DataMining
リンク
Mathematicians becoming data scientists: Should you? How to?
Mathematicians becoming data scientists: Should you? How to? I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter. I asked her: so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry? How would you know whether you might find that kind
nfunato 2017/03/05
DataMining

life
リンク
私たちはいかにして環状線で”悪さをする列車”を捕まえたか | プログラミング | POSTD
文：Daniel Sim　分析：Lee Shangqian、Daniel Sim、Clarence Ng ここ数ヶ月、シンガポールのMRT環状線では列車が何度も止まるものの、その原因が分からないため、通勤客の大きな混乱や心配の種となっていました。私も多くの同僚と同じように環状線を使ってワンノースのオフィスに通っています。そのため、11月5日に列車が止まる原因を調査する依頼がチームに来た時は、ためらうことなく業務に携わることを志願しました。鉄道運営会社SMRTと陸上交通庁（LTA）による事前調査から、いくつかの電車の信号を消失させる信号の干渉があり、それがインシデントを引き起こすことが既に分かっていました。信号が消失すると列車の安全機能である緊急ブレーキが作動するため、不規則に電車が止まる原因となります。しかし8月に初めて発生した今回のインシデントは、不規則に起こっているように見えるた
nfunato 2017/02/24
postdcc

DataMining
リンク
ヤフー、高次元データの高速検索技術「NGT」をオープンソース化　企業に“眠る”ビッグデータを分析しやすく
ヤフーは11月24日、高次元データの高速検索技術「NGT」（Neighborhood Graph and Tree for Indexing）を、商用・非商用を問わず利用できるApache License 2.0のオープンソースソフトウェア（OSS）として「GitHub」で公開した。同技術に関する特許実施権も無償提供する。 NGTは、テキストや画像、商品データ、ユーザーデータなど、複数の特徴を持つ高次元データを、大量のデータベースの中から高速に検索・特定できる技術。200万件の言語データを対象にした場合、これまで最速だった技術「SASH」の約4倍、主流の技術「FLANN」の約12.3倍の速さで検索でき、1000万件の画像データが対象だと、これまで最速だった「直積量子化手法」の約5.6倍、FLANNの約13.5倍の速さで検索できるという。 NGTを使えば、近似したデータを高速でマッチングでき
nfunato 2016/11/26
DataMining

MachineLearning
リンク
Foundations of Data Science - Microsoft Research
Computer science as an academic discipline began in the 60’s. Em phasis was on programming languages, compilers, operating systems, and the mathematical theory that supported these areas. Courses in theoretical computer science covered nite automata, regular expressions, context free languages, and computability. In the 70’s, algorithms was added as an important component of theory. The em phasis w
nfunato 2016/11/04
DataMining

ebook
リンク
GitHub - ResidentMario/missingno: Missing data visualization module for Python.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
nfunato 2016/08/28
DataMining

python

pandas
リンク
gpq/notebooks/contracts_intro.ipynb at master · antontarasenko/gpq
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
nfunato 2016/06/06
DataMining

Jupyter
リンク
Data a Brewery
Data Brewery Data Brewery is a set of Python frameworks and tools for data processing and analysis. Aggregated data browsing, reporting and multidimensional modeling. Contains set of tools, OLAP HTTP server and light-weight Python framework. Overview Documentation Explore Framework for data processing (ETL) and auditing based on virtual data objects with focus on process understandability and usab
nfunato 2016/05/14
python

DataMining

statistics
リンク
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney gives a presentation on the Python data ecosystem and building open source communities. He discusses his background working on Python data tools like pandas and Apache projects. McKinney em phasizes the importance of transparency, consensus building, and valuing all contributions when developing open source software. He also examines challenges in Python packaging and sees opportunitie
nfunato 2016/04/07
DataMining
リンク
Hivemallを利用した機械学習実践入門（第一回: ドラッグストアのセールス予測） - トレジャーデータ（Treasure Data）ブログ
本特集では、Treasure Data環境で利用可能な機械学習ライブラリHivemallを利用した機械学習の実践方法を紹介します。世界のデータサイエンティストが腕を競うデータサイエンスコンペティションサイトKaggleの中から、実践的な課題を扱っていきます。 1. はじめに第一回は小売業の売り上げ予測するタスクであるRossmann Store Salesコンペティションを課題に用います。アルゴリズムとしては、決定木を利用したアンサンブル学習手法の一種であるRandom Forest回帰を利用します*1。 Rossmannはヨーロッパの７カ国で3,000以上の店舗を展開する薬局チェーンです。各店舗のマネージャーは6週間先までの店舗の売り上げを予測することがタスクとして課されています。各店舗の売り上げはプロモーション活動、競合要素、学校の休みや祝日、季節性、地域性など様々な要因に左右されま
nfunato 2016/03/29
DataMining

treasuredata
リンク
Cleaning data in Python | data.library.utoronto.ca
Table of Contents Set up environments Data analysis packages in Python Clean data in Python Load dataset into Spyder Subset Drop data Transf orm data Create new variables Rename variables Merge two datasets Handle missing values A few last words Note : pandas is a powerfull open source Python data analysis library that is used for data cleaning. A complete documentation can be found here . Please
nfunato 2016/03/27
python

DataMining
リンク
Jupyter Notebook Viewer
nfunato 2016/02/23
"takashi-miyamoto-naviplus"

jupyter

DataMining
リンク
箱根駅伝を歴代ベストメンバーで競わせながら楽しむpandasの基本演算 - Qiita
きっかけ「日本のお正月の楽しみといえば、お茶の間で見る箱根駅伝！このお正月も、みなさん盛り上がったんじゃないでしょうか？」・・・というほどの箱根駅伝ファンでは、実はないです。それでも、熱心に見る家族に交じってテレビを見ている最中、「速い大学って毎年変わる。速い選手を一つの大学が同じ年に10人揃えるのって難しいよね」という話があり、その点には少し興味をそそられました。今年はたまたま○○大学（ネタバレ防止のため伏字）が速かったけど、各大学の「山の神」とか「なんとかの神」呼ばれるような歴代最速メンバーが、同じ年代に集って大学対抗の箱根駅伝を走ったとしたら、いったいどの大学が1番になるんだろう。という「たられば」を、過去のデータから探ってみようと思います。環境 Mac OS X 10.10.5 Python 2.7.9 (virtualenv) import sys imp
nfunato 2016/01/09
python

DataMining

statistics
リンク
データサイエンティスト養成読本機械学習入門編の振り返りと補足 - sfchaos's blog
9月10日，技術評論社より「データサイエンティスト養成読本　機械学習入門編」が発売され，おかげさまで約1ヶ月後には増刷が決定しました．お読みいただいた方々に深くお礼申し上げます．データサイエンティスト養成読本機械学習入門編 (Software Design plus) 作者: 比戸将平,馬場雪乃,里洋平,戸嶋龍哉,得居誠也,福島真太朗,加藤公一,関喜史,阿部厳,熊崎宏樹出版社/メーカー: 技術評論社発売日: 2015/09/10メディア: 大型本この商品を含むブログ (7件) を見るまた，出版日の夜には，KDDIウェブコミュニケーションズ様で刊行記念イベントが行われました．「データサイエンティスト養成読本機械学習入門編」刊行記念イベント私も著者の一人として参加させていただきました．足元が優れない中ご参加いただいた方々，会場を提供いただいたKDDIウェブコミュニケーションズ様，
nfunato 2015/11/02
book

DataMining
リンク
LL Ring Recursive
概要技術的特異点まであと30年。今年は人工知能技術が大きく注目を集めました。人工知能技術の基盤となる機械学習やデータ分析の研究分野では、GPUやFPGAのようなハードウェア技術も重要ですが、とてもよく出来たLLフレームワークによって支えられています。この新しい技術を支えるLLフレームワークを、ライブコーディングやLTによって紹介していただきます。出演者佐藤建太 (東京大学/JuliaTokyo) 東京大学大学院農学生命科学研究科修士2年。プログラミング言語Juliaのヘビーユーザーで、ユーザーグループ「JuliaTokyo」の立ち上げメンバーのひとり。NumFOCUSが主催するオープンソース・ソフトウェアプロジェクト支援プログラム「Julia Summer of Code」の参加者。大野健太 (Preferred Networks) 2011年東京大学大学院数理科学研究科修士課
nfunato 2015/08/23
pfi

DataMining
リンク
http://www.longitudinalsem.com/
nfunato 2015/08/23
DataMining

R
リンク
1 2 3 4 5 次のページ