You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
After the usual 30 minutes of socializing and networking, Milind Bhandarkar from LinkedIn, kicked off the evening with a really enlightening talk on "Scaling Hadoop Applications." As a well-respected Hadoop expert and a founding member of the Hadoop team at Yahoo in 2005, Milind was able to articulate the issues and solutions very succinctly. His talk was especially interesting because he tied wel
Overview In the Big Data business running fewer larger clusters is cheaper than running more small clusters. Larger clusters also process larger data sets and support more jobs and users. The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors the framework into a generic resource schedu
Thanks to Tsz-Wo Nicholas Sze and Mahadev Konar for this article. The Problem of Many Small Files The Hadoop Distributed File System (HDFS) is designed to store and process large (terabytes) data sets. At Yahoo!, for example, a large production cluster may have 14 PB disk spaces and store 60 millions of files. However, storing a large number of small files in HDFS is inefficient. We call a file sm
The bottom line is that we achieved the target in Petabytes and got close to the target in the number of files. But this is done with a smaller number of nodes and the need to support a workload close to 100,000 clients has not yet materialized. The question now is whether the goals are feasible with the current system architecture. Namespace Limitations HDFS is based on an architecture where the
Yahoo! OpenID Usability Research OpenID is a Single Sign On protocol that enables users to authenticate at websites using their Yahoo! IDs. Unlike the traditional web login experience where users authenticate by typing in their username and password, users sign into an OpenID Relying Party by typing in their OpenID URL, and after a series of browser redirects and interstitial screens, the user ret
The 4000-node cluster throughput was 7 times better than 500’s for writes and 3.6 times better for reads even though the bigger cluster carried more (4 v/s 2 tasks) per node load than the smaller one. Map-Reduce The primary area of concern was the JobTracker and how it would react to this scale (we had never subjected the JobTracker to heartbeats flowing in from 4000 tasktrackers since it isn't a
Best Practices for Speeding Up Your Web Site The Exceptional Performance team has identified a number of best practices for making web pages fast. The list includes 35 best practices divided into 7 categories. Minimize HTTP Requests tag: content 80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, styleshee
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く