サクサク読めて、アプリ限定の機能も多数!
トップへ戻る
Wikipedia
www.slideshare.net/cloudera
Todd Lipcon gives shares at the Federal Big Data Forum what is new and upcoming in HDFS (Hadoop Distributed File System).
In this talk I’ll go into detail about Tumblr’s experience developing Motherboy, an eventually consistent inbox style storage system built around HBase. The SLA, write concurrency, data volume, and failure modes for this application created a number of challenges in developing a solution. The user homing scheme introduced additional complexity that made capacity planning tricky as we tried to trad
Apache HBase is a rapidly-evolving random-access distributed data store built on top of Apache Hadoop's HDFS and Apache ZooKeeper. Drawing from real-world support experiences, this talk provides administrators insight into improving HBase's availability and recovering from situations where HBase is not available. We share tips on the common root causes of unavailability, explain how to diagnose th
Optimizing MapReduce job performance is often seen as something of a black art. In order to maximize performance, developers need to understand the inner workings of the MapReduce execution framework and how they are affected by various configuration parameters and MR design patterns. The talk will illustrate the underlying mechanics of job and task execution, including the map side sort/spill, th
Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems.
The past year was punctuated by significant advancements in Apache Hadoop and increasingly wider adoption of Hadoop technology across the enterprise. Companies are continuing to use Hadoop in exciting new ways to better serve their customers, inform product development and drive operational efficiency like never before. Join Mike Olson, founder and CEO of Cloudera, as he shares his twelve major pr
MapReduce/Spark/Tezのフェアな性能比較に向けて (Cloudera World Tokyo 2014 LT講演)
The document discusses migrating KT's CDR analysis system from a relational database to NexR's Hadoop-based Data Analytics Platform (NDAP). NDAP provides tools to help with the migration, including converting Oracle data and SQL queries to the Hive query language. The conversion process involves mapping data types, functions, and SQL syntax between Oracle and Hive. NDAP also includes performance m
This is the story of why and how Hadoop was integrated into the Disney data infrastructure. Providing data infrastructure for Disney’s, ABC’s and ESPN’s Internet presences is challenging. Doing so requires cost effective, performant, scalable and highly available solutions. Information requirements from the business add the need for these solutions work together; providing consistent acquisition,
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second. This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they can be imp
Attend this session and walk away armed with solutions to the most common customer problems. Learn proactive configuration tweaks and best practices to keep your cluster free of fetch failures, job tracker hangs, and the like.
Performance is a thing that you can never have too much of. But performance is a nebulous concept in Hadoop. Unlike databases, there is no equivalent in Hadoop to TPC, and different use cases experience performance differently. This talk will discuss advances on how Hadoop performance is measured and will also talk about recent and future advances in performance in different areas of the Hadoop st
How can you rank product search results when you have very little data about how past shoppers have interacted with the products? Through large scale analysis of its clickstream data, Etsy is automatically discovering product attributes (things like materials, prices, or text features) which signal that a search result is particularly relevant (or irrelevant) to a given query. This attribute-level
Cloudera's Todd Lipcon's presentation slides for the HBase HUG, "Avoiding Full GCs with MemStore-Local Allocation Buffers."
With a community of over 500 contributors, Apache Hadoop and related projects are evolving at an ever increasing rate. Join the co-creator of Apache Hadoop, Doug Cutting, and Cloudera’s Chief Scientist, Jeff Hammerbacher, for a discussion of the most exciting new features being developed by the Apache Hadoop community.
The document summarizes HBase use at Facebook, including its development and future work. HBase is used for incremental updates to data warehouses, high frequency analytics, and write-intensive workloads. Development includes Hive integration, master high availability, and random read optimizations. Future work focuses on coprocessors, intelligent load balancing, and cluster performance.Read less
Hw09 Practical HBase Getting The Most From Your H Base Install
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
Join Cloudera’s founder and Chief Scientist, Jeff Hammerbacher, as he describes ten common problems that are being solved with Apache Hadoop. A replay of the webinar can be viewed here: https://www1.gotomeeting.com/register/719074008Read less
Private content!This content has been marked as private by the uploader.
This document provides an overview of Hadoop and how it can be used for data consolidation, schema flexibility, and query flexibility compared to a relational database. It describes the key components of Hadoop including HDFS for storage and MapReduce for distributed processing. Examples of industry use cases are also presented, showing how Hadoop enables affordable long-term storage and scalable
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
このページを最初にブックマークしてみませんか?
『Cloudera, Inc.』の新着エントリーを見る
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く