[B! bigdata] javasoxのブックマーク

javasox id:javasox

bigdataに関するjavasoxのブックマーク (23)

SankeiBiz（サンケイビズ）：自分を磨く経済情報サイト
サービス終了のお知らせ SankeiBizは、2022年12月26日をもちましてサービスを終了させていただきました。長らくのご愛読、誠にありがとうございました。産経デジタルがお送りする経済ニュースは「iza! 経済ニュース」でお楽しみください。このページは5秒後に「iza!経済ニュース」（https://www.iza.ne.jp/economy/）に転送されます。ページが切り替わらない場合は以下のボタンから「iza! 経済ニュース」へ移動をお願いします。 iza! 経済ニュースへ
javasox 2016/12/02
bigdata

aws

Spark

hdp
リンク
Hive/Hivemallを利用した広告クリックスルー率(CTR)の推定 - Qiita
Hadoop Advent Calendar 2013 2013 12/25のXmasエントリです。本記事では私が開発しているHadoop/Hive上で動作する機械学習ライブラリのHivemallについて、KDD Cup 2012, Track 2のデータセットを用いて利用方法を解説します。 https://github.com/myui/hivemall 基本的にプロジェクトのWikiサイトにあるKDDCup 2012 track 2 CTR predictionの説明を丁寧にしたものです。a9a binaryやnews20 binaryの方がよりシンプルの例ですので、そちらも参考にして頂ければと思います。 KDD Cup 2012, Track 2のCTR推定タスクこのタスクは与えられたセッション情報（ユーザ属性と広告の属性）をもとに、検索エンジンの広告クリック率(Click-Th
javasox 2016/09/15
あとで読む

machine learning

bigdata
リンク
CDAP
The Data Analytics PlatformA 100% open source, integrated framework that accelerates application development for data analytics Ajai NarayananAlbert ShauAli AnwarAndreas NeumannBhooshan MogalDerek WoodEdwin EliaJay JinLea Cuniberti-DuranNitin MotgiPoorna ChandraRohit SinhaSagar KapareSreevatsan RamanTerence YimTony HajdariVinisha ShahYaojie Feng
javasox 2016/06/14
oss

bigdata
リンク
Welcome to Apache Kylin | Apache Kylin
Smarter and FasterKylin is a high concurrency, high performance and intelligent OLAP engine that provides low-cost and ultimate data analytics experience.
javasox 2016/06/14
apache

oss

bigdata
リンク
Hadoop MapReduceで大きな相関行列を計算する（Calculate Large Correlation Coefficient Matrix with Hadoop MapReduce) - tetsuya_odakaの日記
前回までのログで、観測値行列から相関行列を求めるための一通りの仕組みができた。この開発をスタートするときに、以下の目標を立てた。実行時間の目標：以下のクラスターを用い、5000変量で、各変量につき5000サンプルあるとして1時間以内での計算を行う。インフラ Amazon Elastic MapReduce リージョン US Standard インスタンスタイプ m1.small マスタ・インスタンスグループ 1インスタンスコア・インスタンスグループ 8インスタンスタスク・インスタンスグループ 10インスタンス観測値データ [0,10]の一様乱数から発生させた小数点以下１桁までのデータ（2500万個=5000*5000）を利用した。このデータはPCで生成した（Javaプログラム）。 => 有効な桁数が少ない（＝データサイズが小さい）ので、今後の課題として、その評価も必要になると思
javasox 2016/06/07
bigdata

emr

MapReduce

hadoop
リンク
Pachyderm Challenges Hadoop with Containerized Data Lakes
javasox 2016/05/11
bigdata

parallel programming

oss
リンク
Clustering Similar Images Using MapReduce Style Feature Extraction with C# and R – Data Science Central
javasox 2016/05/10
computer vision

bigdata

R
リンク
DataScienceCentral.com - Big Data News and Analysis
The data platform debt you don’t see coming Saqib Jan | August 28, 2025 at 2:05 pm Data Platform Debt... Designing AI factories: Purpose-built, on-prem GPU data centers Martin Summer | August 26, 2025 at 2:39 pm Discover how purpose-built AI factories are transf orming on-premises GPU data centers for high-performance AI workloads,... How diagnosis image annotation turns scans into insights Rayan P
javasox 2016/05/10
datascience

bigdata

parallel programming
リンク
DARPA - Open Catalog
javasox 2016/04/26
DARPA

bigdata

oss
リンク
http://tootsie-musical.com/
javasox 2016/03/16
bigdata

education
リンク
Apache Beam®
Introducing Apache BeamThe Unified Apache Beam ModelThe easiest way to do batch and streaming data processing. Write once, run anywhere data processing for mission-critical production workloads. Link to GitHub Repo Introducing Apache BeamThe Unified Apache Beam ModelThe easiest way to do batch and streaming data processing. Write once, run anywhere data processing for mission-critical production w
javasox 2016/03/08
Google

Beam

bigdata
リンク
Falcon - Falcon - Feed management and data processing platform
Why? Establishes relationship between various data and processing elements on a Hadoop environment Feed management services such as feed retention, replications across clusters, archival etc. Easy to onboard new workflows/pipelines, with support for late data handling, retry policies Integration with metastore/catalog such as Hive/HCatalog Provide notification to end customer based on availability
javasox 2016/03/08
apache falcon

bigdata
リンク
Apache OODT - Distributed Data Management
Disciplined Data Management Apache Object Oriented Data Techno logy (OODT) is the smart way to integrate and archive your processes, your data, and its metadata. OODT allows you to: Generate Data Process Data Manage Your Data Distribute Your Data Analyze Your Data Allowing for the integration of data, computation, visualization and other components. Solidify Your Data Processing Traditional process
javasox 2016/03/08
oodt

bigdata
リンク
Calculations with arrays bigger than your memory (dask arrays)
javasox 2016/02/23
dask

data

bigdata

parallel programming

python
リンク
Big Data in Python: out of core processing – MKTSTK
javasox 2016/02/23
bigdata

blaze

hdf5
リンク
Pytables: persistent matrices using HDF5
javasox 2016/02/22
hdf5

bigdata
リンク
How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct | Amazon Web Services
AWS Big Data Blog How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that
javasox 2016/02/18
ETL

bigdata

aws
リンク
Using Docker to Build an IPython-driven Spark Deployment - { lab41 }
TL;DR: Our ipython-spark-docker repo is a way to deploy an Apache Spark cluster driven by IPython notebooks, running Docker containers for each component. The project uses Bash scripts to build each node type from a common Docker image that contains all necessary packages, enables data access from a Hadoop cluster, and runs on dedicated hosts. By using IPython as the interface, you can leverage a
javasox 2016/02/17
bigdata

docker
リンク
Kaseya DattoCon Miami, FL
javasox 2016/02/16
RAM

data

datascience

bigdata
リンク
Visualization of large datasets with <code>tabplot</code>
The tableplot is a powerful visualization method to explore and analyse large multivariate datasets. In this vignette, the implementation of tableplots in R is described, and illustrated with the diamonds dataset from the ggplot2 package. Introduction The tableplot is a visualization method that is used to explore and analyse large datasets. Tableplots are used to explore the relationships between
javasox 2016/02/04
bigdata

visualization

R
リンク
1 2 次のページ

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx