[B! spark] TokyoIncidentsのブックマーク

Real-time Recommendations using Spark Comcast Labs

Databricks is the data and AI company. More than 10,000 organizations worldwide — including Comcast, Condé Nast, Grammarly, and over 50% of the Fortune 500 —...

TokyoIncidents 2019/03/19

spark

リンク

Apache spark 2.3 and beyond

Microservices Meetup vol.8 Lightning Talks Battle! で話した内容です https://microservices-meetup.connpass.com/event/99190/ This document summarizes a microservices meetup hosted by @mosa_siru. Key points include: 1. @mosa_siru is an engineer at DeNA and CTO of Gunosy. 2. The meetup covered Gunosy's architecture with over 45 GitHub repositories, 30 stacks, 10 Go APIs, and 10 Python batch processes using AW

TokyoIncidents 2019/03/19

spark

リンク

グーグル、「Apache Spark」向け「Kubernetes Operator」のベータ版リリースを発表

「Apache Spark」は、データエンジニアリングや機械学習のワークロード用の非常に人気が高い実行フレームワークだ。Databricksのプラットフォームに利用され、「Azure HDInsight」「Amazon EMR」「Google Cloud Dataproc」など、オンプレミスとクラウドベースの両方の「Hadoop」サービスで利用できる。また、「Mesos」クラスタでも実行できる。だが、Mesosを使わず、「Hadoop YARN」の文字列を付加することなしに「Kubernetres」（k8s）クラスタでSparkのワークロードを実行したい場合はどうなのだろうか？Sparkはまず、バージョン2.3のリリースでKubernetes固有の機能を追加し、バージョン2.4でそれを改善したが、完全に統合された方法で、Sparkをk8sでネイティブで実行させるのは、まだ難しい場合がある

TokyoIncidents 2019/02/02

リンク

SparkのWebUIでモニタリング

皆さんこんにちは、GMOアドマーケティングのS.Rです。 SparkのProgramを開発する上で、Performanceの改良やInstanceの設定のTuningはかなり重要です。これらのチューニングはSparkのWebUIを使えばかなり簡単に制御できます。そこで、今回はSparkのWebUIを皆さんへ紹介致します。 ※この記事を理解するには、Spark、Hadoop、Linuxのshellコマンドの基本知識が必要です。１　Sparkとは？ Sparkの概要は以下のWikipediaの記事を参考にして下さい。 Apache Sparkはオープンソースのクラスタコンピューティングフレームワークである。カリフォルニア大学バークレー校のAMPLabで開発されたコードが、管理元のApacheソフトウェア財団に寄贈された。Sparkのインタフェースを使うと、暗黙のデータ並列性と耐故障性を

TokyoIncidents 2019/01/29

spark

リンク

Apache Zeppelin 0.7.3 Documentation: Front-end Angular API in Apache Zeppelin

Basic Usage In addition to the back-end API to handle Angular objects binding, Apache Zeppelin also exposes a simple AngularJS z object on the front-end side to expose the same capabilities. This z object is accessible in the Angular isolated scope for each paragraph. Bind / Unbind Variables Through the z, you can bind / unbind variables to AngularJS view. Bind a value to an angular object and a m

TokyoIncidents 2017/07/22

リンク

「Apache Zeppelin」のインストール方法まとめ - Qiita

次世代データ分析基盤としてApache Sparkが非常に注目されていますが、データ分析の重要な要素としてデータの可視化（Visualization）が挙げられます。 PythonやRではその辺のツールも充実しています。（Matplotlib, ggplot etc...) さらに、インタラクティブ環境下でコード実行したり、グラフを描画することが可能なiPython NotebookやRStudioなどは分析者にとって非常に便利です。 Apache SparkをiPython Notebookのような環境で動作させるソフトウェアとして、現在Databricksが「Databricks Cloud」を開発しています。しかし、現在一部のユーザにしか公開されていないため、誰でも気軽に使える環境にはなっていません。そこで、Databricks Cloudと同じような環境を提供してくれる「Apac

TokyoIncidents 2017/07/22

apache
spark

リンク

Apache Zeppelinでデータ分析を分散処理する - Part 1: データ分析のライフサイクル - Qiita

Apache ZeppelinはIPython Notebookに代表されるWebブラウザからインタラクティブに使え、データ分析と可視化、重要なレポーティングが1つのノートブックになるツールです。幾つか定義がありますが、データ分析はデータ準備、データの探索、可視化、モデル構築、レポート、運用といったライフサイクルを回す必要があります。レポーティングやコラボレーションも含めて1つのインタフェースで行えるのは魅力的です。作成したノートブックはZeppelinHubに公開すると共有したり共同作業することができます。新しい技術を勉強するときは誰かの動くコードを真似るのが習得の近道です。分散処理基盤 SparkとHadoopの分散環境をバックエンドに採用しているため大規模データのインタラクティブな分析に向いています。LINEや、Naver、CloudbreakのSequencIQを買収したHor

TokyoIncidents 2017/07/22

リンク

VPN(Point To Site)接続する際の注意点 – SIOS Tech. Lab

みなさん、こんにちは。サイオステクノロジー武井です。今回は、Azure OpenAI ServiceによるRAG実装ガイドを公開しましたので、ご紹介させてください。 ※ このブログでのご紹介ととも以下のイベントでもガイド […]

TokyoIncidents 2017/07/22

リンク

Open Sourcing TensorFlowOnSpark: Distributed Deep... | Hadoop at Yahoo

By Lee Yang, Jun Shi, Bobbie Chern, and Andy Feng (@afeng76), Yahoo Big ML team Introduction Today, we are pleased to offer TensorFlowOnSpark to the community, our latest open source framework for distributed deep learning on big-data clusters. Deep learning (DL) has evolved significantly in recent years. At Yahoo, we’ve found that in order to gain insight from massive amounts of data, we need to

TokyoIncidents 2017/02/14

リンク

How Apache Spark makes your slow MySQL queries 10x faster

In this blog post, we’ll discuss how to improve the performance of slow MySQL queries using Apache Spark. In my previous blog post, I wrote about using Apache Spark with MySQL for data analysis and showed how to transf orm and analyze a large volume of data (text files) with Apache Spark. Vadim also performed a benchmark comparing the performance of MySQL and Spark with Parquet columnar format (usi

TokyoIncidents 2016/09/16

mysql
spark

リンク

GitHub - EclairJS/eclairjs-node: Node.js API for Apache Spark with Remote Client

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

TokyoIncidents 2016/08/18

リンク

Cloudera Blog

In an era where artificial intelligence (AI) is reshaping enterprises across the globe—be it in healthcare, finance, or manufacturing—it’s hard to overstate the transf ormation that AI has had on businesses, regardless of industry or size. At Cloudera, we recognize the urgent need for bold steps to harness this potential and dramatically accelerate the time to […] Read blog post

TokyoIncidents 2016/06/01

kafka
spark

リンク

機械学習に適した大規模分散計算環境Apache Spark - 自然言語処理 on Mac

大規模分散計算環境のApache Sparkは、HadoopのMacReduceに比べてメモリ内で効率的に処理を行うことが特長で、機械学習、ストリーム処理、グラフ解析、SQL データ分析などの機能ライブラリがあります。Spark自体はScala言語で実装されていますが、Scalaの他にPython用のAPIや対話的なシェルも用意されていて、どちらの言語でもプログラミングや動作確認が可能です。動作確認は簡単にできて、Java6以上がインストールされていれば、Downloadsページから適当なビルド済みパッケージをダウンロードして解凍すると、ScalaまたはPythonの対話的シェルが起動できます： $ curl -O http://d3kbcqa49mib13.cloudfront.net/spark-1.0.2-bin-hadoop1.tgz $ tar zxf spark-1.0.2-b