Michael G. Noll[B!]新着記事・評価 - はてなブックマーク

What Every Software Engineer Should Know about Apache Kafka: Events, Streams, Tables, Storage, Processing, And More
8 users
www.michael-noll.com

To help fellow engineers wrap their head around Apache Kafka and event streaming, I wrote a 4-part series on the Confluent blog on Kafka’s core fundamentals. In the series, we explore Kafka’s storage and processing layers and how they interrelate, featuring Kafka Streams and ksqlDB. In the first part, I begin with an overview of events, streams, tables, and the stream-table duality to set the stag
- テクノロジー
- 2020/05/17 10:39
- kafka
- あとで読む

Integrating Kafka and Spark Streaming: Code Examples and State of the Game
11 users
www.michael-noll.com

Spark Streaming has been getting some attention lately as a real-time data processing tool, often mentioned alongside Apache Storm. If you ask me, no real-time data processing tool is complete without Kafka integration (smile), hence I added an example Spark Streaming application to kafka-storm-starter that demonstrates how to read from Kafka and write to Kafka, using Avro as the data format and T
- テクノロジー
- 2014/10/02 06:46
- kafka
- spark
- Streaming
- architecture
Integrating Kafka and Storm: Code Examples and State of the Game
3 users
www.michael-noll.com

The only thing that’s even better than Apache Kafka and Apache Storm is to use the two tools in combination. Unfortunately, their integration can and is still a pretty challenging task, at least judged by the many discussion threads on the respective mailing lists. In this post I am introducing kafka-storm-starter, which contains many code examples that show you how to integrate Apache Kafka 0.8+
- テクノロジー
- 2014/05/29 07:38
Of Algebirds, Monoids, Monads, and other Bestiary for Large-Scale Data Analytics
5 users
www.michael-noll.com

Have you ever asked yourself what monoids and monads are, and particularly why they seem to be so attractive in the field of large-scale data processing? Twitter recently open-sourced Algebird, which provides you with a JVM library to work with such algebraic data structures. Algebird is already being used in Big Data tools such as Scalding and SummingBird, which means you can use Algebird as a me
- テクノロジー
- 2013/12/04 11:40
- Scala
- Algebird
Understanding the Internal Message Buffers of Storm
15 users
www.michael-noll.com

When you are optimizing the performance of your Storm topologies it helps to understand how Storm’s internal message queues are configured and put to use. In this short article I will explain and illustrate how Storm version 0.8/0.9 implements the intra-worker communication that happens within a worker process and its associated executor threads. Internal messaging within Storm worker processes Il
- 暮らし
- 2013/06/25 15:02
- storm
- ZeroMQ
- netty
- *あとで
Running a Multi-Node Storm Cluster
5 users
www.michael-noll.com

In this tutorial I will describe in detail how to set up a distributed, multi-node Storm cluster on RHEL 6. We will install and configure both Storm and ZooKeeper and run their respective daemons under process supervision, similarly to how you would operate them in a production environment. I will show how to run an example topology in the newly built cluster, and conclude with an operational FAQ
- テクノロジー
- 2013/05/30 15:48
Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node
7 users
www.michael-noll.com

In this article I describe how to install, configure and run a multi-broker Apache Kafka 0.8 (trunk) cluster on a single machine. The final setup consists of one local ZooKeeper instance and three local Kafka brokers. We will test-drive the setup by sending messages to the cluster via a console producer and receive those messages via a console receiver. I will also describe how to build Kafka for
- テクノロジー
- 2013/04/09 20:33
Implementing Real-Time Trending Topics with a Distributed Rolling Count Algorithm in Storm
20 users
www.michael-noll.com

A common pattern in real-time data workflows is performing rolling counts of incoming data points, also known as sliding window analysis. A typical use case for rolling counts is identifying trending topics in a user community – such as on Twitter – where a topic is considered trending when it has been among the top N topics in a given window of time. In this article I will describe how to impleme
- テクノロジー
- 2013/01/23 23:57
Understanding the Parallelism of a Storm Topology
9 users
www.michael-noll.com

In the past few days I have been test-driving Twitter’s Storm project, which is a distributed real-time data processing platform. One of my findings so far has been that the quality of Storm’s documentation and example code is pretty good – it is very easy to get up and running with Storm. Big props to the Storm developers! At the same time, I found the sections on how a Storm topology runs in a c
- 世の中
- 2012/11/01 19:08
Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.
4 users
www.michael-noll.com

In this article I introduce some of the benchmarking and testing tools that are included in the Apache Hadoop distribution. Namely, we look at the benchmarks TestDFSIO, TeraSort, NNBench and MRBench. These are popular choices to benchmark and stress test an Hadoop cluster. Hence knowing how to run these tools will help you to shake out your cluster in terms of architecture, hardware and software,
- テクノロジー
- 2011/06/18 11:26
- Performance
- Test
Running Hadoop On Ubuntu Linux (Multi-Node Cluster)
11 users
www.michael-noll.com

In this tutorial I will describe the required steps for setting up a distributed, multi-node Apache Hadoop cluster backed by the Hadoop Distributed File System (HDFS), running on Ubuntu Linux. Tutorial approach and structure Prerequisites Configuring single-node clusters first Done? Let’s continue then! Networking SSH access Hadoop Cluster Overview (aka the goal) Masters vs. Slaves Configuration c
- テクノロジー
- 2011/01/23 19:20
- Hadoop
- Ubuntu
Running Hadoop On Ubuntu Linux (Single-Node Cluster)
19 users
www.michael-noll.com

In this tutorial I will describe the required steps for setting up a pseudo-distributed, single-node Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux. Prerequisites Sun Java 6 Adding a dedicated Hadoop system user Configuring SSH Disabling IPv6 Alternative Hadoop Installation Update $HOME/.bashrc Excursus: Hadoop Distributed File System (HDFS) Configuration hado
- テクノロジー
- 2010/12/21 15:45
- hadoop
- ubuntu
- tutorial
Writing An Hadoop MapReduce Program In Python
23 users
www.michael-noll.com

In this tutorial I will describe how to write a simple MapReduce program for Hadoop in the Python programming language. Motivation What we want to do Prerequisites Python MapReduce Code Map step: mapper.py Reduce step: reducer.py Test your code (cat data | map | sort | reduce) Running the Python Code on Hadoop Download example input data Copy local example data to HDFS Run the MapReduce job Improv
- テクノロジー
- 2010/12/06 00:24
Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll
20 users
www.michael-noll.com

What we want to do In this short tutorial, I will describe the required steps for setting up a single-node Hadoop cluster using the Hadoop Distributed File System (HDFS) on Ubuntu Linux. Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault
- テクノロジー
- 2009/01/06 13:34
- Hadoop
- ubuntu
- Linux
- development
- *あとで読む
Del.icio.us Python API - Michael G. Noll
5 users
www.michael-noll.com

One of my recent research tasks required me to retrieve various information from del.icio.us, a well-known social bookmarking service. My programming language of choice is Python, and so I wrote a basic Python module for getting the data I needed. News: As of August 01, 2008, del.icio.us has relaunched its web service. Due to a lot of changes behind the scenes, all users of my Python API have to u
- 暮らし
- 2008/07/04 10:21
- del.icio.us
- python
- API
Running Hadoop On Ubuntu Linux (Multi-Node Cluster) - Michael G. Noll
20 users
www.michael-noll.com

What we want to do In this tutorial, I will describe the required steps for setting up a multi-node Hadoop cluster using the Hadoop Distributed File System (HDFS) on Ubuntu Linux. Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolera
- テクノロジー
- 2008/05/12 14:46
- hadoop
- mapreduce
- Ubuntu
- 分散
- linux
- 事例
- 検索
Running Hadoop On Ubuntu Linux (Multi-Node Cluster) - Michael G. Noll
11 users
www.michael-noll.com

What we want to do In this tutorial, I will describe the required steps for setting up a multi-node Hadoop cluster using the Hadoop Distributed File System (HDFS) on Ubuntu Linux. Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolera
- 暮らし
- 2007/11/16 12:08
- hadoop
- linux
Hadoop Python: Writing An Hadoop MapReduce Program In Python - Michael G. Noll
37 users
www.michael-noll.com

In this tutorial, I will describe how to write a simple MapReduce program for Hadoop in the Python programming language. Motivation Even though the Hadoop framework is written in Java, programs for Hadoop need not to be coded in Java but can also be developed in other languages like Python or C++ (the latter since version 0.14.1). However, the documentation and the most prominent Python example o
- 暮らし
- 2007/11/01 14:15
- python
- mapreduce
- hadoop
- google
- tips

はてなブックマーク

はてなブックマーク

『Michael G. Noll』

What Every Software Engineer Should Know about Apache Kafka: Events, Streams, Tables, Storage, Processing, And More

Integrating Kafka and Spark Streaming: Code Examples and State of the Game

Integrating Kafka and Storm: Code Examples and State of the Game

Of Algebirds, Monoids, Monads, and other Bestiary for Large-Scale Data Analytics

Understanding the Internal Message Buffers of Storm

Running a Multi-Node Storm Cluster

Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node

Implementing Real-Time Trending Topics with a Distributed Rolling Count Algorithm in Storm

Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.

Running Hadoop On Ubuntu Linux (Multi-Node Cluster)

Running Hadoop On Ubuntu Linux (Single-Node Cluster)

Writing An Hadoop MapReduce Program In Python

Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll

Del.icio.us Python API - Michael G. Noll

Running Hadoop On Ubuntu Linux (Multi-Node Cluster) - Michael G. Noll

Running Hadoop On Ubuntu Linux (Multi-Node Cluster) - Michael G. Noll

Hadoop Python: Writing An Hadoop MapReduce Program In Python - Michael G. Noll

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス

『Michael G. Noll』

このページはまだブックマークされていません

キーボードショートカット一覧

公式Twitter

はてなのサービス

このページはまだ
ブックマークされていません