[B! sql][hadoop] yassのブックマーク

yass id:yass

sqlとhadoopに関するyassのブックマーク (21)

Inside Yellow Pages' SQL-on-Hadoop Journey
yass 2016/02/11
hadoop

vertica

impala

orc

parquet

sql
リンク
Discover Scaling Analytics at Amplitude | Amplitude | Amplitude
Scaling Analytics at AmplitudeLaying the foundation with pre-aggregation and lambda architecture. Update: In May 2016 we updated our analytics architecture to NOVA. Read the article here.Three weeks ago, we announced that we are giving away a compelling list of analytics features for free for up to 10 million events per month. That’s an order of magnitude more data than any comparable service, and
yass 2015/08/30
hadoop

sql

lambda architecture

s3
リンク
SQL on Hadoop 比較検証【2014月11日における検証レポート】
Impala Meetup 2014/10/31 @Tokyo 講演資料【注意事項】本資料で紹介している検証結果は2014年当時のものです。当該ソフトウェアは成長や改善が早く、現時点のバージョンでは大きく異なる機能や性能となっています。 SQL on Hadoopの最新情報に基づくサービスやシステムインテグレーションにご興味をお持ちの方は、NTTデータ基盤システム事業本部 OSSプロフェッショナルサービス（電子メール： hadoop [AT] kits.nttdata.co.jp）にご相談ください。Read less
yass 2014/11/05
Hadoop

benchmark

comparison

hive

Impala

sql

presto

impala

tez
リンク
SQL on Hadoop in Taiwan
This document discusses SQL engines for Hadoop, including Hive, Presto, and Impala. Hive is best for batch jobs due to its stability. Presto provides interactive queries across data sources and is easier to manage than Hive with Tez. Presto's distributed architecture allows queries to run in parallel across nodes. It supports pluggable connectors to access different data stores and has language bi
yass 2014/09/27
presto

hadoop

sql
リンク
Cloudera Blog
The ongoing progress in Artificial Intelligence is constantly expanding the realms of possibility, revolutionizing industries and societies on a global scale. The release of LLMs surged by 136% in 2023 compared to 2022, and this upward trend is projected to continue in 2024. Today, 44% of organizations are experimenting with generative AI, with 10% having […] Read blog post
yass 2014/09/05
impala

hadoop

cloudera

sql
リンク
Cloudera Blog
The ongoing progress in Artificial Intelligence is constantly expanding the realms of possibility, revolutionizing industries and societies on a global scale. The release of LLMs surged by 136% in 2023 compared to 2022, and this upward trend is projected to continue in 2024. Today, 44% of organizations are experimenting with generative AI, with 10% having […] Read blog post
yass 2014/09/04
" a new approach using a hybrid engine that leverages Tez and something new called LLAP (Live Long and Process, #llap online). "

Hadoop

hive

stinger

sql

tez
リンク
War of the Hadoop SQL engines. And the winner is ...? - Sonra
War of the Hadoop SQL engines. And the winner is …? You may have wondered why we were quiet over the last couple of weeks? Well, we locked ourselves into the basement and did some research and a couple of projects and PoCs on Hadoop, Big Data, and distributed processing frameworks in general. We were also looking at Clickstream data and Web Analytics solutions. Over the next couple of weeks we wil
yass 2014/07/28
" Right now I would run both batch style queries (ETL) and interactive queries on Hive Tez as Hive offers the richest SQL feature set, especially analytic functions and supports a wide set of file formats. "

hadoop

sql

hive

tez

impala

presto

spark

infinidb

drill
リンク
Apache Drill at ApacheCon2014
Lot of workloads exist for Big data, batch, machine learning, search, interactive SQL, Operational/user facing applicationsApache Drill fits into the interactive SQL category Analytics on Semi-Structured/Nested dataUse standard SQL to query Nested data without upfront flattening/modelingExtensions to ANSI SQL to operate on nested dataGeneric architecture for a broad variety of nested data types (e
yass 2014/06/25
" Current state : Alpha • Timeline 1.0 Beta (End of Q2, 2014) 1.0 GA (Q3, 2014) "

drill

hadoop

sql

MapR
リンク
Yahoo Betting on Apache Hive, Tez, and YARN
by The Hadoop Platforms Team Low-latency SQL queries, Business Intelligence (BI), and Data Discovery on Big Data are some of the hottest topics these days in the industry with a range of solutions coming to life lately to address them as either proprietary or open-source implementations on top of Hadoop. Some of the popular ones talked about in the Big Data communities are Hive, Presto, Impala, S
yass 2014/05/17
" Hive 0.13 execution times were comparable or better than Shark on a 100 node cluster. "

hadoop

hive

yahoo

stinger

sql

Tez

shrak
リンク
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C. Srivas, Co-founder and CTO at MapR
SQL is one of the most widely used languages to access, analyze, and manipulate structured data. As Hadoop gains traction within enterprise data architectures across industries, the need for SQL for both structured and loosely-structured data on Hadoop is growing rapidly Apache Drill started off with the audacious goal of delivering consistent, millisecond ANSI SQL query capability across wide ran
yass 2014/05/16
drill

hadoop

sql
リンク
Under Construction | Home
yass 2014/05/03
impala

vertica

hadoop

sql

benchmark
リンク
Apache Hadoop Distribution | MapR
Your HPE MyAccount provides you with: Single sign-on to the HPE ecosystem Personalized recommendations Test drives and other trials And many more exclusive benefits
yass 2014/02/28
hadoop

hive

drill

impala

presro

shark

spark

comparison

sql
リンク
商用Hadoopディストロ「Pivotal HD」とDBエンジン「HAWQ」を提供開始
yass 2014/02/07
hadoop

sql

HAWQ

pivotal
リンク
Hadoop＋SQL＋インメモリ、マルチクラウド対応の「Pivotal One」プラットフォーム発表。EMC World 2013
Hadoop＋SQL＋インメモリ、マルチクラウド対応の「Pivotal One」プラットフォーム発表。EMC World 2013 EMCがラスベガスで開催中のイベント「EMC World 2013」。2日目の基調講演には、EMCとVMwareが設立した新会社「Pivotal」のCEO ポール・マリッツ（Paul Maritz）氏が登壇し、クラウド時代のアプリケーション基盤となる「Pivotal One」を発表しました。 Pivotalは、EMCが買収したGreenplumや開発コンサルタントのPivotal Labs、VMwareが買収したSpring SourceやCloudFoundryなどのチームを集めて12月に発足した組織。今月から正式な企業としての活動を開始しています。 Pivotal Oneは、ビッグデータとクラウド時代のアプリケーション基盤として、同社が今年末にリリース予定
yass 2014/02/06
hadoop

EMC

pivotal

hdfs

greenplum

sql
リンク
オープンソースのSQL-in-Hadoopソリューション:我々はいまどこに？
Spring BootによるAPIバックエンド構築実践ガイド第2版何千人もの開発者が、InfoQのミニブック「Practical Guide to Building an API Back End with Spring Boot」から、Spring Bootを使ったREST API構築の基礎を学んだ。この本では、出版時に新しくリリースされたバージョンである Spring Boot 2 を使用している。しかし、Spring Boot3が最近リリースされ、重要な変...
yass 2014/01/16
hadoop

sql

drill

presto

impala
リンク
Teradata Presto | Product Details | Open Source
Teradata Blogs When big data becomes vast, what's your data dropping strategy? Read more Support Teradata at Your Service (TAYS) Simple, secure customer access to products, services, education, and support function information. Read more Certifications Teradata Certified Professional Program (TCPP) Management, development, and oversight of the premiere Teradata Certification Program. Read more Con
yass 2013/11/02
" SQL processed by a specialized (Google-inspired) SQL engine that sits on a Hadoop cluster. Both Impala and Drill fall into this category. Impala is inspired by Google’s F1 project and Drill by Google’s Dremel project. "

hadoop

impala

drill

stinger

hadapt

hive

sql
リンク
Don't use Hadoop - your data isn't that big
"So, how much experience do you have with Big Data and Hadoop?" they asked me. I told them that I use Hadoop all the time, but rarely for jobs larger than a few TB. I'm basically a big data neophite - I know the concepts, I've written code, but never at scale. The next question they asked me. "Could you use Hadoop to do a simple group by and sum?" Of course I could, and I just told them I needed t
yass 2013/09/18
" If you have a single table containing many terabytes of data, Hadoop might be a good option for running full table scans on it. If you don’t have such a table, avoid Hadoop like the plague. / Hadoop does not have any conception of indexing. Hadoop has only full table scans. "

hadoop

sql

bigdata
リンク
Hadoopのセカンダリソートを避け、より高速に値をソートする方法
HadoopのReduceに渡されるのはキーと値のリストだが、このとき値のリストに含まれる各アイテム（値そのもの）はソートされていない。ソートされていて欲しい場合にはセカンダリソートと呼ばれるテクニックを使うのが定石とされているが、これは実装の面でも概念的な面でもバッドノウハウ的な側面がある。Hadoopには「キーをソートする」機能は実装されている。そこで、値をキーに入れてしまい、このHadoopに備わっている「キーをソートする」機能によって、実質的に値をソートしようというわけだ。 Map/Reduceというのはキーごとにデータを分割して処理する方法なので、「キーに値が入ったら分割がおかしくなるんじゃ？」と思うのは当然である。キーに値が入っていても、分割に影響しないよう、Partitioningクラスを自分で拡張し、分割の基準となる値（本来のキー）には、値の影響が出ないようにするのだ。それ
yass 2013/08/16
" つまりセカンダリソートはウ○コだということなのである(w そこで、Java組み込み型のRDBMSであるH2を利用して、値のソートを行うというテクニックを使う。Reduceの処理において、単純にすべての値をH2データベースに格納"

hadoop

sort

h2

sql

reduce
リンク
SQL, PigのCUBE - wyukawa's diary
SQLで小計や総合計を求める時にGROUP BYを利用することが多いと思いますがいろんな軸で集計したい場合にROLLUP, CUBE, GROUPING SETSを使うことができるようです。詳しくはこちら参照 http://homepage2.nifty.com/sak/w_sak3/doc/sysbrd/sq_kj04_4.htm ROLLUP, CUBE, GROUPING SETSを使うことができますと断定していないのは僕が試してないからです（汗なぜ試していないかというとこれらの機能を利用できるのがOracle, SQL Server, DB2だからです。Oracle XEをダウンロードしようかと思いましたけどユーザ登録に心が折れましたw　ちなみにMySQLではROLLUPのみサポートしているらしいです。今回は考えられる全ての組み合わせで集計するCUBEについて書いてみたいと思
yass 2013/04/26
hadoop

SQL

pig

cube

bi
リンク
GitHub - intel-hadoop/project-panthera-ase: Analytical SQL Engine (ASE) for Hadoop under "Project Panthera"
Dismiss All your code in one place Over 40 million developers use GitHub together to host and review code, project manage, and build software together across more than 100 million projects. Sign up for free See pricing for teams and enterprises
yass 2013/02/28
hadoop

sql

intel
リンク
1 2 次のページ