manboubirdのブックマーク - はてなブックマーク

Reading and Writing Avro Files from the Command Line
manboubird 2015/07/27
avro
リンク
Using Avro in MapReduce jobs with Hadoop, Pig, Hive
Apache Avro is a very popular data serialization format in the Hadoop techno logy stack. In this article I show code examples of MapReduce jobs in Java, Hadoop Streaming, Pig and Hive that read and/or write data in Avro format. We will use a small, Twitter-like data set as input for our example MapReduce jobs. Requirements Prerequisites Example data Avro schema Avro data files Preparing the input d
manboubird 2014/09/07
avro

hive

pig
リンク
Of Algebirds, Monoids, Monads, and other Bestiary for Large-Scale Data Analytics
Have you ever asked yourself what monoids and monads are, and particularly why they seem to be so attractive in the field of large-scale data processing? Twitter recently open-sourced Algebird, which provides you with a JVM library to work with such algebraic data structures. Algebird is already being used in Big Data tools such as Scalding and SummingBird, which means you can use Algebird as a me
manboubird 2013/12/04
algebird

monoid

monad

scalding

summingbird

algebra
リンク
Replephant: Analyzing Hadoop Cluster Usage with Clojure
manboubird 2013/09/22
replephant

repl

hadoop

client

clojure
リンク
Implementing Real-Time Trending Topics with a Distributed Rolling Count Algorithm in Storm
A common pattern in real-time data workflows is performing rolling counts of incoming data points, also known as sliding window analysis. A typical use case for rolling counts is identifying trending topics in a user community – such as on Twitter – where a topic is considered trending when it has been among the top N topics in a given window of time. In this article I will describe how to impleme
manboubird 2013/02/02
storm

trendDetection
リンク
Running Hadoop On Ubuntu Linux (Multi-Node Cluster)
In this tutorial I will describe the required steps for setting up a distributed, multi-node Apache Hadoop cluster backed by the Hadoop Distributed File System (HDFS), running on Ubuntu Linux. Tutorial approach and structure Prerequisites Configuring single-node clusters first Done? Let’s continue then! Networking SSH access Hadoop Cluster Overview (aka the goal) Masters vs. Slaves Configuration c
manboubird 2011/01/23
ubuntu

hadoop

setup

config
リンク
1