[B! concurrent-computing][apache-hadoop] nabinnoのブックマーク

nabinno id:nabinno

concurrent-computingとapache-hadoopに関するnabinnoのブックマーク (12)

Cloudera
ClouderaNOW Learn about the latest innovations in data, analytics, and AI Watch now
nabinno 2019/12/22
cloudera

apache-hadoop

distributed-computing

concurrent-computing
リンク
GitHub - treasure-data/trino-client-ruby: Trino/Presto client library for Ruby
require 'trino-client' # create a client object: client = Trino::Client.new( server: "localhost:8880", # required option ssl: {verify: false}, catalog: "native", schema: "default", user: "frsyuki", password: "********", time_zone: "US/Pacific", language: "English", properties: { "hive.force_local_scheduling": true, "raptor.reader_stream_buffer_size": "32MB" }, http_proxy: "proxy.example.com:8080",
nabinno 2019/12/19
github

treasure-data

presto-client

ruby

presto

mapreduce

apache-hadoop

distributed-computing

concurrent-computing
リンク
Welcome to Apache Pig!
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. At the present time, Pig's infrastructure l
nabinno 2019/12/19
apache-pig

mapreduce

apache-hadoop

distributed-computing

concurrent-computing
リンク
Apache Pig - Wikipedia
nabinno 2019/12/19
apache-pig

mapreduce

apache-hadoop

distributed-computing

concurrent-computing
リンク
Presto (SQL query engine) - Wikipedia
Presto (including PrestoDB, and PrestoSQL which was re-branded to Trino) is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, Mongo DB and Teradata,[1] and allows use of multiple data sources within a query. Presto is community-driven open-source software released under
nabinno 2019/12/17
presto

mapreduce

apache-hadoop

distributed-computing

concurrent-computing
リンク
Apache Sparkってどんなものか見てみる（その１ - 夢とガラクタの集積場
こんにちは。 Kafkaを試している最中で微妙ですが、最近使えるのかなぁ、と情報を集めているのが「Apache Spark」です。 MapReduceと同じく分散並行処理を行う基盤なのですが、MapReduceよりも数十倍速いとかの情報があります。・・・んな阿呆な、とも思ったのですが、内部で保持しているRDDという仕組みが面白いこともあり、とりあえず資料や論文を読んでみることにしました。まず見てみた資料は「Overview of Spark」（http://spark.incubator.apache.org/talks/overview.pdf）です。というわけで、読んだ結果をまとめてみます。 Sparkとは？高速でインタラクティブな言語統合クラスタコンピューティング基盤 Sparkプロジェクトのゴールは？以下の2つの解析ユースケースにより適合するようMapReduceを拡張
nabinno 2019/12/15
apache-spark

mapreduce

apache-hadoop

distributed-computing

concurrent-computing
リンク
Apache Hive - Wikipedia
Apache Hive はHadoopの上に構築されたデータウェアハウス構築環境であり、データの集約・問い合わせ・分析を行う[1]。Apache Hiveは当初はFacebookによって開発されたが、その後Netflixのようにさまざまな団体が開発に参加し、またユーザーとなった[2][3]。 Hive はAmazon Web ServicesのAmazon Elastic MapReduceにも含まれている[4]。 Apache HiveはHadoop互換のファイルシステム（たとえばAmazon S3)に格納された大規模データセットの分析を行う。使用には、map/reduceを完全にサポートしたSQLライクな「HiveQL」という言語を用いる。クエリの高速化のため、ビットマップインデックスを含めたインデクス機能も実装している[5]。標準設定では、Hiveはメタデータを組み込みApach
nabinno 2019/12/15
apache-hive

mapreduce

structured-query-language

apache-hadoop

distributed-computing

concurrent-computing
リンク
分散処理に入門してみた（Hadoop + Spark） | Casley Deep Innovations株式会社技術ブログ
こんにちは。SI部の腰塚です。 RDBやデータウェアハウスの仕事に携わることが多かった筆者は、数年前からたびたび聞こえたビッグデータ分析や機械学習のための分散処理フレームワークに興味を覚えたものの、ついぞアクセスしないままここまで来てしまいました。今回ブログを書くにあたって、せっかくなのでイチから手さぐり入門し、いまさら他人に聞けない分散処理の初歩からhadoop・sparkを触ってみるまでをまとめたいと思います。 1.分散処理の基礎知識 1-1.分散処理の処理方式：MapReduce まず分散処理とは、ひとつの計算処理をネットワークで接続した複数のコンピュータで同時並列で処理することです。ビッグデータ活用の市場が日々大きくなるに従って、数百テラ～ペタのデータ処理も珍しいものではなくなっており、日常的にこの規模のデータを扱うシステムでは、現実的な時間的・費用的コストで処理する工夫が必要
nabinno 2019/12/15
apache-hadoop

apache-spark

mapreduce

distributed-computing

concurrent-computing
リンク
Azure HDInsight - Hadoop、Spark、Kafka | Microsoft Azure
nabinno 2019/09/20
azure-hdinsight

apache-hadoop

apache-spark

apache-kafka

concurrent-computing

extract-transform-load

analytics
リンク
Amazon Athena（SQL を使用した S3 でのデータクエリ）| AWS
Amazon Athena は、オープンソースフレームワーク上に構築されたサーバーレスのインタラクティブな分析サービスで、オープンテーブルとファイル形式をサポートしています。Athena は、ペタバイト規模のデータが存在する場所で分析するための簡素化された柔軟な方法を提供します。Amazon Simple Storage Service (S3) データレイクと 25 以上のデータソース (オンプレミスデータソースや SQL または Python を使用した他のクラウドシステムを含む) からデータを分析したり、アプリケーションを構築したりできます。Athena は、オープンソースの Trino および Presto エンジンと Apache Spark フレームワーク上に構築されており、プロビジョニングや設定は不要です。
nabinno 2017/06/02
amazon-athena

structured-query-language

apache-hadoop

mapreduce

distributed-computing

concurrent-computing
リンク
Welcome to Apache™ Hadoop™!
Apache Hadoop The Apache® Hadoop® project develops open-source software for reliable, scala ble, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation an
nabinno 2014/08/03
apache-hadoop

apache-software-foundation
リンク
Apache Hadoop - Wikipedia
Apache Hadoopは大規模データの分散処理を支えるオープンソースのソフトウェアフレームワークであり、Javaで書かれている。Hadoopはアプリケーションが数千ノードおよびペタバイト級のデータを処理することを可能としている。HadoopはGoogleのMapReduceおよびGoogle File System(GFS)論文に触発されたものである。 HadoopはApacheのトップレベルプロジェクトの1つであり、世界規模の開発貢献者コミュニティによって開発され、使用されている。[2] Hadoopは、以下の4つのモジュールによって構成されている。 Hadoop Common: 他のモジュールから共通して利用されるライブラリ群。 Hadoop Distributed File System (HDFS): Hadoop独自の分散ファイルシステム。 Hadoop YARN: Hado
nabinno 2014/02/06
apache-hadoop

apache-software-foundation
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx