[B! hadoop] catechingのブックマーク

Ingesting data with Spark using a custom Hadoop FileInputFormat

cateching 2019/09/27

リンク

Cloudera Blog

In an era where artificial intelligence (AI) is reshaping enterprises across the globe—be it in healthcare, finance, or manufacturing—it’s hard to overstate the transf ormation that AI has had on businesses, regardless of industry or size. At Cloudera, we recognize the urgent need for bold steps to harness this potential and dramatically accelerate the time to […] Read blog post

cateching 2018/10/17

リンク

2016年のHadoop活用事例紹介 | Hadoop Advent Calendar 2016 #24 | DevelopersIO

こんにちは、小澤です。この記事はHadoop Advent Calendar 24日目のものとなります。 1人でHadoopの話をする Advent Calendar 2016 - Qiita Hadoop Advent Calendar 2016 ｜シリーズ｜ Developers.IO 前回はHueについて書かせていただきました。今回は2016年にHadoop系のカンファレンスでの発表タイトルなどから、主に国内での実際の活用事例を紹介したいと思います。 Hadoopは「概要やどんなことができるかはある程度わかったけど、それをどう活用していいかイメージがつかない」という話もよく聞く領域なので、ご参考にしていただければと思います。紹介リクルートまずはリクルートさんの事例になります。Hadoop活用に関する発表は以前から頻繁に行っている会社です。貼らせていただいた資料はHad

cateching 2018/07/06

hadoop

リンク

1人でHadoopの話をするのカレンダー | Advent Calendar 2016 - Qiita

About reserved postingIf you register a secret article by the day before the same day, it will be automatically published around 7:00 on the same day. About posting periodOnly articles submitted after November 1 of the year can be registered. (Secret articles can be registered anytime articles are posted.)

cateching 2018/07/06

hadoop
hive

リンク

What is Predicate Pushdown? - PHPFog.com

The basic idea of predicate pushdown is that certain parts of SQL queries (the predicates) can be “pushed” to where the data lives. This optimization can drastically reduce query/processing time by filtering out data earlier rather than later. Depending on the processing framework, predicate pushdown can optimize your query by doing things like filtering data before it is transf erred over the net

cateching 2018/02/08

リンク

第20回　Sparkの設計と実装［1］～登場の背景とデータ処理の特徴 | gihyo.jp

はじめに今回から2回に渡って、並列データ処理系のひとつであるSparkについて解説します。まずはじめに、Sparkの開発が始められた経緯を紹介し、次にSparkの特徴を説明します。 Sparkが登場した背景 Sparkは、Hadoop MapReduceと同様に、複数の計算機を用いてデータ処理を行う並列データ処理系です。2009年に、カリフォルニア大学バークレー校のAMPLabにて、Matei Zaharia氏を中心として開発が始まりました。Sparkの開発が始まった当時、世の中にはすでにHadoopが存在しており、高い耐障害性を有しかつスケーラブルな並列データ処理を、コモディティな計算機を用いて行うことは一般的になりつつありました。しかし、Hadoop MapReduceは必ずしも個々の計算機のメモリを効率的に活用する設計ではありませんでした。 Hadoop MapReduceは、ジョ