[B! spark] moozのブックマーク

hscj2019_ishizaki_public

「DataFrameとDatasetの内部をのぞいてみる」という内容の発表を、Hadoop / Spark Coference Japan 2019で行いました http://hadoop.apache.jp/hcj2019-program/

mooz 2019/03/15

DataFrame, DataSet, RDD

spark

リンク

Data Actionability Platform | Unravel Data

Don’t just observe. Take action. Data observability actionability platform

mooz 2015/07/02

Spark チューニングのコンサル会社

spark

リンク

[SPARK-3561] Allow for pluggable execution contexts in Spark - ASF JIRA

Currently Spark provides integration with external resource-managers such as Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the current architecture of Spark-on-YARN can be enhanced to provide significantly better utilization of cluster resources for large scale, batch and/or ETL applications when run alongside other applications (Spark and others) and services in YARN. Propos

mooz 2014/11/04

Spark on Tez の試みが進んでるっぽい。POCな実装あり。

spark
tez

リンク

When would someone use Apache Tez instead of Apache Spark, or vice versa? Do their use cases overlap to a large extent?

Answer (1 of 11): I do not agree with the very good answer by Sandy Ryza. Though the answer is more or less correct, there is one use case where Tez can score significantly over Spark. This is the one which involves extreme scale - for instance, if you want to join a 100Terabyte table to another ...

mooz 2014/10/11

巨大データを out-of-core に処理するときは Tez って感じかな

リンク

Hadoopソースコードリーディング第16回に参加してきました | DevelopersIO

Hadoopソースコードリーディング第16回に参加してきました。今回は1.0がリリースされる目前のApache Sparkがテーマでした。 NTTデータ濱野さんの冒頭の挨拶 Spark1.0リリースを記念する予定が、されていないｗ今回はお酒を飲んでグダグダする時間はないｗ Apache Sparkのご紹介（前半） NTTデータ土橋さんまずは土橋さんからSparkの背景やSpark Summit 2013の振り返り、Sparkの基本についての説明がありました。詳細はスライドを見てもらった方がいいですが、さくっと雰囲気を掴みたい方は以下のメモをご参照下さい。土橋さん 6年前からHadoopに関わっている。基本はインフラエンジニア Ansible使っている。アジェンダ Sparkの背景 Spark Summit 2013振り返り Sparkのキホン RDD スケジューラ前提机上調

mooz 2014/10/09

spark

リンク

あのSpark開発の総本山Databricksは何を目指しているのか、共同創業者に聞く

分散クラスターでのビッグデータ分析をインメモリーで高速に行うオープンソースソフトウエア（OSS）の「Spark」。その開発の中核を担う企業が米Databricksだ。Sparkを開発した米University of California Berkeley（UCB）の研究組織「AMPLab」からスピンアウトして、2013年に設立されたベンチャー企業である。同社の事業内容はあまり明らかになっていなかったが、2014年6月に開催したSparkのイベント「Spark Summit 2014」を機に、Sparkを手軽に利用できるようにするためのクラウドサービス「Databricks Cloud」を投入したり（関連記事：高速ビッグデータ分析をクラウドで、Spark開発元のDatabricksがサービス開始）、Hadoopディストリビューションベンダーと相次いで提携したりするなど（関連記事：次世代Ha

mooz 2014/08/27

ほー「将来的にBlinkDBの機能は、Spark SQLに取り込まれることになるだろう。ちょうど3カ月前、BlinkDBを発案したエンジニアが当社と協業を始めたところだ」

spark

リンク

Stratosphere » Stratosphere version 0.5 available

mooz 2014/06/11

Stratosphere, Apache Incubator になるのか。あと名前が衝突してるからプロジェクト名変えるらしい。

リンク

Spark Internals - Hadoop Source Code Reading #16 in Japan

Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal

mooz 2014/05/30

リンク

Spark at Twitter - Seattle Spark Meetup, April 2014

Spark at Twitter - Seattle Spark Meetup, April 2014 The document discusses the results of a study on the impact of climate change on coffee production. Researchers found that suitable land for coffee production could decline by up to 50% by 2050 due to rising temperatures and changing rain patterns associated with climate change. Arabica coffee was found to be most at risk, as its growing regions

mooz 2014/05/28

Mesos じゃなく YARN なのか

spark
hadoop

リンク

Spark Bindings

mooz 2014/04/29

Mahout 面白いことになってる。DB/言語屋さんが好きそうな方向へ。

リンク

Upcoming events

mooz 2013/12/30

超クール "scikit-learn / PySpark integration sprint"

spark

リンク

NECがビッグデータの機械学習を高速化する技術を開発、インメモリー処理やMPIを導入

NECは、ビッグデータ分析を高速化する分散処理フレームワーク「Feliss」を開発した（発表資料）。ビッグデータ分析でよく用いられるHadoopは、Map-Reduce型の単純な分析であれば高速に実行できるが、繰り返し演算を多用する機械学習処理では、ジョブ間でストレージを経由してデータをやり取りするHDFSがボトルネックとなり、演算の効率を上げにくい。そこでNECのFelissでは、ジョブ間のデータのやり取りをインメモリーで実施するようにした。さらに演算ノード間の通信などにおいて、並列処理の際のメッセージパッシングのAPIとして一般的な「MPI」を同時に使えるようにした。これにより、機械学習のような複雑な演算について、通常のHadoopを用いる場合と比べて10倍ほど高速に実行できるようにした。FelissはHDFSのインタフェースを備えており、最初のデータ読み出しはHDFSから行える。

mooz 2013/11/13

Spark に Shark に継続の永続化に、ずいぶんマニアックで良い記事

リンク

Python Programming Guide - Spark 0.7.3 Documentation

Python Programming Guide The Spark Python API (PySpark) exposes most of the Spark features available in the Scala version to Python. To learn the basics of Spark, we recommend reading through the Scala programming guide first; it should be easy to follow even if you don’t know Scala. This guide will show how to use the Spark features described there in Python. Key Differences in the Python API The

mooz 2013/08/21

cool

python
spark

リンク

はてなブックマーク

タグ

関連タグで絞り込む (11)

sparkに関するmoozのブックマーク (14)

お知らせ

今週のはてなブックマーク数ランキング（2024年7月第3週）

今週のはてなブックマーク数ランキング（2024年7月第2週）

はてなブックマーク透明性レポート（2024年 2月-2024年4月）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス