rsakamotのブックマーク - はてなブックマーク

Cloudera Blog

rsakamot 2018/04/18

HBase

リンク

Understanding HDFS Recovery Processes (Part 1) - Cloudera Blog

Having a good grasp of HDFS recovery processes is important when running or moving toward production-ready Apache Hadoop. An important design requirement of HDFS is to ensure continuous and correct operations to support production deployments. One particularly complex area is ensuring correctness of writes to HDFS in the presence of network and node failures, where the lease recovery, block recove

rsakamot 2017/04/07

hadoop
HDFS

リンク

HDFS DataNode Scanners and Disk Checker Explained - Cloudera Blog

As many of us know, data in HDFS is stored in DataNodes, and HDFS can tolerate DataNode failures by replicating the same data to multiple DataNodes. But exactly what happens if some DataNodes’ disks are failing? This blog post explains how some of the background work is done on the DataNodes to help HDFS to manage its data across multiple DataNodes for fault tolerance. Particularly, we will explai

rsakamot 2017/04/06

hadoop

リンク

Improving Query Performance Using Partitioning in Apache Hive - Cloudera Blog

Improving Query Performance Using Partitioning in Apache Hive Our thanks to Rakesh Rao of Quaero, for allowing us to re-publish the post below about Quaero’s experiences using partitioning in Apache Hive. In this post, we will talk about how we can use the partitioning features available in Hive to improve performance of Hive queries. Partitions Hive is a good tool for performing queries on large

rsakamot 2017/04/05

hive

リンク

How-to: Tune Your Apache Spark Jobs (Part 1) - Cloudera Blog

Editor’s Note, January 2021: This blog post rem ains for historical interest only. It covers Spark 1.3, a version that has become obsolete since the article was published in 2015. For a modern take on the subject, be sure to read our recent post on Apache Spark 3.0 performance. You can also gain practical, hands-on experience by signing up for Cloudera’s Apache Spark Application Performance Tuning

rsakamot 2017/03/24

spark

リンク

How-to: Tune Your Apache Spark Jobs (Part 2) - Cloudera Blog

How resource tuning, parallelism, and data representation affect Spark 1.3 job performance. Editor’s Note, January 2021: This blog post rem ains for historical interest only. It covers Spark 1.3, a version that has become obsolete since the article was published in 2015. For a modern take on the subject, be sure to read our recent post on Apache Spark 3.0 performance. You can also gain practical, h

rsakamot 2017/03/24

spark

リンク

Cloudera Blog

We are thrilled to announce the general availability of the Cloudera AI Inference service, powered by NVIDIA NIM microservices, part of the NVIDIA AI Enterprise platform, to accelerate generative AI deployments for enterprises. This service supports a range of optimized AI models, enabling seamless and scala ble AI inference. Background The generative AI landscape is evolving […] Read blog post

rsakamot 2017/03/23

spark
yarn

リンク

Cloudera Data Science Workbench: Self-Service Data Science for the Enterprise - Cloudera Blog

Cloudera Data Science Workbench: Self-Service Data Science for the Enterprise We are entering the golden age of machine learning, and it’s all about the data. As the quantity of data grows and the costs of compute and storage continue to drop, the opportunity to solve the world’s biggest probl ems has never been greater. Our customers already use advanced machine learning to build self-driving cars

rsakamot 2017/03/15

hadoop

リンク

Untangling Apache Hadoop YARN, Part 4: Fair Scheduler Queue Basics - Cloudera Blog

Untangling Apache Hadoop YARN, Part 4: Fair Scheduler Queue Basics In this installment, we provide insight into how the Fair Scheduler works, and why it works the way it does. In Part 3 of this series, you got a quick introduction to Fair Scheduler, one of the scheduler choices in Apache Hadoop YARN (and the one recommended by Cloudera). In Part 4, we will cover most of the queue properties, some

rsakamot 2016/12/15

fairschedulerでがんばろうとするとどんどん複雑になる..

hadoop

リンク

Better SLAs via Resource-preemption in YARN's CapacityScheduler - Cloudera Blog

Better SLAs via Resource-preemption in YARN’s CapacityScheduler Mayank Bansal, of EBay, is a guest contributing author of this collaborative blog. This is the 4th post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of Apache Hadoop YARN in HDP. Background In Had

rsakamot 2016/12/13

hadoop
yarn

リンク

Cloudera Blog

Enterprises see embracing AI as a strategic imperative that will enable them to stay relevant in increasingly competitive markets. However, it rem ains difficult to quickly build these capabilities given the challenges with finding readily available talent and resources to get started rapidly on the AI journey. Cloudera recently signed a strategic collaboration agreement with Amazon […] Read blog p

rsakamot 2016/09/16

リンク

Flafka: Apache Flume Meets Apache Kafka for Event Processing - Cloudera Blog

The new integration between Flume and Kafka offers sub-second-latency event processing without the need for dedicated infrastructure. In this previous post you learned some Apache Kafka basics and explored a scenario for using Kafka in an online application. This post takes you a step further and highlights the integration of Kafka with Apache Hadoop, demonstrating both a basic ingestion capabilit

rsakamot 2016/08/22

Kafka
flume

リンク

Cloudera Blog

rsakamot 2016/08/18

リンク

Cloudera Blog

We are thrilled to announce the general availability of the Cloudera AI Inference service, powered by NVIDIA NIM microservices, part of the NVIDIA AI Enterprise platform, to accelerate generative AI deployments for enterprises. This service supports a range of optimized AI models, enabling seamless and scala ble AI inference. Background The generative AI landscape is evolving […] Read blog post

rsakamot 2016/08/17

hadoop

リンク

A Guide to Checkpointing in Hadoop - Cloudera Blog

Understanding how checkpointing works in HDFS can make the difference between a healthy cluster or a failing one. Checkpointing is an essential part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of overall cluster health. However, checkpointing can also be a source of confusion for operators of Apa

rsakamot 2016/07/19

HDFS
hadoop

リンク

How HiveServer2 Brings Security and Concurrency to Apache Hive - Cloudera Blog

How HiveServer2 Brings Security and Concurrency to Apache Hive Apache Hive was one of the first projects to bring higher-level languages to Apache Hadoop. Specifically, Hive enables the legions of trained SQL users to use industry-standard SQL to process their Hadoop data. However, as you probably have gathered from all the recent community activity in the SQL-over-Hadoop area, Hive has a few limi

rsakamot 2016/07/19

hive

リンク

Cloudera Blog

The ongoing progress in Artificial Intelligence is constantly expanding the realms of possibility, revolutionizing industries and societies on a global scale. The release of LLMs surged by 136% in 2023 compared to 2022, and this upward trend is projected to continue in 2024. Today, 44% of organizations are experimenting with generative AI, with 10% having […] Read blog post

rsakamot 2016/04/25

hadoop
yarn

リンク

Cloudera Blog

We are thrilled to announce the general availability of the Cloudera AI Inference service, powered by NVIDIA NIM microservices, part of the NVIDIA AI Enterprise platform, to accelerate generative AI deployments for enterprises. This service supports a range of optimized AI models, enabling seamless and scala ble AI inference. Background The generative AI landscape is evolving […] Read blog post

rsakamot 2016/04/25

yarn
hadoop

リンク

Cloudera Blog

We are thrilled to announce the general availability of the Cloudera AI Inference service, powered by NVIDIA NIM microservices, part of the NVIDIA AI Enterprise platform, to accelerate generative AI deployments for enterprises. This service supports a range of optimized AI models, enabling seamless and scala ble AI inference. Background The generative AI landscape is evolving […] Read blog post

rsakamot 2016/03/28

リンク

Managing CPU Resources in your Hadoop YARN Clusters - Cloudera Blog

This is the fourth post in a series that explores the theme of enabling diverse workloads in YARN. See the introductory post to understand the context around all the new features for diverse workloads as part of YARN in HDP 2.2. Introduction When it comes to managing resources in YARN, there are two aspects that we, the YARN platform developers, are primarily concerned with: Resource allocation: A

rsakamot 2016/02/26

hadoop
yarn

リンク

はてなブックマーク

タグ

ブックマーク / blog.cloudera.com (21)

お知らせ

今週のはてなブックマーク数ランキング（2024年11月第2週）

今週のはてなブックマーク数ランキング（2024年11月第1週）

月間はてなブックマーク数ランキング（2024年10月）

公式Twitter

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス