goingerのブックマーク / 2010年7月19日

Big Data in Real-Time at Twitter

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013

goinger 2010/07/19

リンク

How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook

Hadoop was developed to solve probl ems with data warehousing systems at Yahoo and Facebook that were limited in processing large amounts of raw data in real-time. Hadoop uses HDFS for scala ble storage and MapReduce for distributed processing. It allows for agile access to raw data at scale for ad-hoc queries, data mining and analytics without being constrained by traditional database schemas. Hado

goinger 2010/07/19

リンク

ningliang's piglet at master - GitHub

goinger 2010/07/19

リンク

Hadoop and Pig at Twitter__HadoopSummit2010

1. Hadoop is used extensively at Twitter to handle large volumes of data from logs and other sources totaling 7TB per day. Tools like Scribe and Crane are used to input data and Elephant Bird and HBase for storage. 2. Pig is used for data analysis on these large datasets to perform tasks like counting, correlating, and researching trends in users and tweets. 3. The results of these analyses are us

goinger 2010/07/19

リンク

NoSQL Databases List by Hosting Data - Updated 2024

The decision was made after the owners recognized that they have a common objective - helping people in the UK (and beyond) understand web hosting and all its intricacies, including NoSQL databases. This list is updated monthly. Learn more here. We've tested every single one of the best web hosting companies in the U.K. many of which use NoSQL databases in their server management. See our results

goinger 2010/07/19

nosql

リンク

Integrating Hive and HBase at Facebook

While definitely interesting, something doesn’t seem to add up: It (nb HBase) sidesteps Hadoop’s append-only constraint by keeping recently updated data in memory and incrementally rewriting data to new files, splitting and merging intelligently based on data distribution changes. Since it is based on Hadoop, making HBase interoperate with Hive is straightforward, meaning HBase tables can be acces

goinger 2010/07/19

リンク

Is Cassandra winning the NoSQL race? - Tony Bain

Building products & teams that leverage data, analytics, AI & automation to do amazing things. Based in Melbourne, Australia, an emerging data science hotspot. The opinions and positions expressed are my own and do not necessarily reflect those of my employer Cassandra is fast emerging as one of the key NoSQL databases. While we often express that the point of NoSQL is to offer more choice than a

goinger 2010/07/19

cassandra

リンク

NoSQL Twitter Applications

Everyone is building these days a Twitter-like or Twitter-related project using some NoSQL solution. I guess they can use as a ‘scientific’ explanation for these experiments Nati Shalom’s (Gigaspaces) great ☞ post on the common principles behind NoSQL alternatives (the post was inspired by his talk at QCon on building a scala ble Twitter application. The presentation is embedded below). MusicTweets

goinger 2010/07/19

nosql

リンク

Super Technique 講座～ガベージコレクタ

「ガベージコレクション(略してGC)」は、メモリ管理の大技である。主として Lisp インタプリタで利用したセルを再利用するために使われて来たが、最近のオブジェクト指向言語でも、Smalltalk-80 に始まり、Java や Eiffel でプログラマが作成したインスタンスについて、明示的にデストラクタを起動しなくても済むように利用されている。このようにプログラミング言語とその環境にとって、大変重要な機能なのだが、別に自分が書くアプリケーションで使っていけないわけではない。また、一度自分で実装してみると、ガベージコレクタを備えた言語を使う場合でも、その動作に対する洞察が効くというものである。ここでは、メモリ管理の技との関連でガベージコレクタについて考察してみよう。 realloc モデル改良された malloc ＆ free モデル汎用的な malloc ＆ free モデル参照カ

goinger 2010/07/19

tech

リンク

C言語 Super Technique 講座

このページは、Ｃ言語の中級テクニックを中心に解説する。長らくプログラマをしていると、Ｃ言語の面白い使い方例が蓄積している。これらを一挙公開するために、このページを作ったのである。しかし、単にＣに留まらず、他の言語の面白い特徴なども紹介していく。内容的にはかなりヘヴィである。当然のことながら、「ポインタ虎の巻」程度の内容はちゃんと使いこなせることを前提とする。意外な技、落し穴、派手なテクニックなど、内容満載だが、ちゃんとデータ構造とアルゴリズムなども説明できれば良いと思う。（まあ、ぼちぼちやってきいます...）以下の目次には手引きのために、評価がつけてある。凡例として示す。レベルその解説で記載されている内容のレベル有用度その内容が実際に役に立つものかどうか邪悪度その内容が薦める方法が、一般的なコーディング規約の中で「邪悪」とされがちなものであるか否か。関数ポインタの活用（濫用

goinger 2010/07/19

tech
hack

リンク

Super Technique 講座～m4 チュートリアル

m4 はUNIXの標準コマンドの１つであり、古い歴史を持つマクロプロセッサである。しかし、やや使い方が難しく、しかも古典度が高く専門的なために、どうしても紹介のプライオリティが下がる傾向があって、日本語でマトモに書かれた解説にお目にかかったことがない。そこで、m4 に多少の経験値がある筆者があえて m4 のチュートリアルを書いて見せる。基本的な情報は m4 の info から仕入れており、それにいろいろな実例を加えて書いている。マクロプロセッサ m4 とは？ m4 の使い方 m4 のディレクティブマクロ定義に関するディレクティブマクロ置換の原則マクロ引数 undefine と include 条件分岐ループメタ文字の入れ換え組み込み文字列処理関数その他マクロプロセッサ m4 とは？ m4 はマクロプロセッサである。つまり、Ｃプリプロセッサ cpp （今時だと gcc -E

goinger 2010/07/19

リンク

GNU M4 - GNU Project - Free Software Foundation

Introduction to GNU M4 GNU M4 is an implementation of the traditional Unix macro processor. It is mostly SVR4 compatible although it has some extensions (for example, handling more than 9 positional parameters to mac ros). GNU M4 also has built-in functions for including files, running shell commands, doing arithmetic, etc. GNU M4 is a macro processor in the sense that it copies its input to the ou

goinger 2010/07/19

m4
gnu

リンク

漢(オトコ)のコンピュータ道: MySQLを高速化する10の方法

ちょっとキャッチ−なタイトルをつけてしまったが、今日は独断と偏見でMySQLを高速化する方法を10個紹介しよう。MySQLサーバをチューニングするときや初期導入する場合などに参考にしてもらいたい。 1. バッファを増やす、または減らすチューニングの基本中の基本であるが、適切なバッファサイズを設定することはパフォーマンスチューニングの要である。主なバッファは次の通り。 innodb_buffer_pool_size・・・InnoDBだけを利用する場合は空きメモリの7〜8割程度を割り当てる最も重要なバッファである。余談だが、実際にはここで割り当てた値の5〜10%ぐらいを多めにメモリを使うので注意が必要だ。 key_buffer_size・・・MyISAMだけを利用する場合は、空きメモリの3割程度を割り当てるといい。残りはファイルシステムのキャッシュ用に残しておこう。 sort_buffer_

goinger 2010/07/19

mysql

リンク

さらにMySQLを高速化する7つの方法

MySQLを高速化する10の方法という記事がとても好評だったようである。記事を読んで頂いた皆さん、ありがとう。この記事に対する便乗（？）でWeb屋のネタ帳: PostgreSQLを高速化する16のポイントという記事を書いて頂いたようだが、そちらの方もかなり人気だったようである。他人が作ったソフトウェアに改良を加えるというフリーソフトウェアやオープンソースソフトウェアの精神も基本は便乗であるので、便乗については大いに賛成したいというかむしろ取り上げてくれてありがとう！！と思うわけであるが、ここでさらに俺はこう考える。と。 Web屋のネタ帳さんの記事では16のポイントが紹介されているが、漢（オトコ）のコンピュータ道の記事は10の方法だったのであと6つ足りない。オトコは数で勝負！！というわけで今日はネタを振り絞ってさらに7つのMySQL高速化テクニックを紹介しよう。 1. インテルコンパイラ

goinger 2010/07/19

mysql

リンク

Learning NoSQL from Twitter's Experience

Leaving aside the tons of NoSQL Twitter applications — and if that is not enough here are more NoSQL-based Twitter apps and even more, Twitter seems to be having a lot of fun (nb read work and innovation) in the NoSQL space. It all started with the probl em of handling big data in real-time. Nick Kallen’s (@nk) slides below are explaining the probl ems faced and the way Twitter tackled them: Then it

goinger 2010/07/19

リンク

Amazon.com: Html5: Up and Running: Dive Into the Future of Web Development: Pilgrim, Mark: Books

goinger 2010/07/19

book
amazon

リンク

Amazon.com: MySQL Pocket Reference: Reese, George: Books

goinger 2010/07/19

mysql

リンク

Amazon.com: Apache 2 Pocket Reference: Ford, Andrew: Books

goinger 2010/07/19

リンク

CityDO! 全国花火大会特集2009 打ち上げ数ランキング

■プロ直伝！デジイチで花火を撮る。 ■花火まめ知識 >花火の楽しみ方 >花火の種類 ■花火写真館 ■イチオシの花火大会

goinger 2010/07/19

リンク

--- Benesse BERD ---【調査データ検索＜検索結果・詳細＞】

goinger 2010/07/19

リンク

Groovy Tutorial for MongoDB

Java Tutorial (http://www.mongo db.org/display/DOCS/Java+Tutorial) Since Groovy is based on Java, you can make use of the Java driver for Mongo DB as well as go through the tutorial shown above. The code can be used as-is or modified to look more groovy. Create a groovy file (mongo.groovy) and place the Java driver in the same directory. Edit it so the code looks like: this.class.classLoader.rootLoa

goinger 2010/07/19

リンク

Map-Reduce - MongoDB Manual v7.0

General InformationDocumentationDeveloper Articles & TopicsCommunity ForumsBlogUniversity

goinger 2010/07/19

mapreduce

リンク

MongoDB Performance - MongoDB Manual v7.0

As you develop and operate applications with Mongo DB, you may need to analyze the performance of the application and its database. When you encounter degraded performance, it is often a function of database access strategies, hardware availability, and the number of open database connections. Some users may experience performance limitations as a result of inadequate or inappropriate indexing stra

goinger 2010/07/19

mongodb

リンク

MongoDB Manual Contents - MongoDB Manual v7.0

General InformationDocumentationDeveloper Articles & TopicsCommunity ForumsBlogUniversity

goinger 2010/07/19

リンク

The Cassandra Distributed Database

This document summarizes Cassandra, an open source distributed database management system designed to handle large amounts of data across many commodity servers. It discusses Cassandra's history, key features like tunable consistency levels and support for structured and indexed columns. Case studies describe how companies like Digg, Twitter, Facebook and Mahalo use Cassandra to handle terabytes o

goinger 2010/07/19

cassandra

リンク

Hadoopソースコードリーディング第3回 Hadopo MR + Cassandra

The document discusses integrating Apache Cassandra, a NoSQL database, with Hadoop MapReduce. Specifically, it describes how Cassandra can be used as an input source and storage destination for MapReduce jobs. It also provides information on configuration options and contributing code to the Cassandra MapReduce integration.Read less

goinger 2010/07/19

リンク

GitHub - twitter-archive/flockdb: A distributed, fault-tolerant graph database

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert

goinger 2010/07/19

リンク

NoSQL at Twitter (NoSQL EU 2010)

A discussion of the different NoSQL-style datastores in use at Twitter, including Hadoop (with Pig for analysis), HBase, Cassandra, and FlockDB.Read less

goinger 2010/07/19

リンク

Why MongoDB is awesome

Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Mongo DB

goinger 2010/07/19

mongodb

リンク

Indexing with MongoDB

Indexes are references to documents that are efficiently ordered by key and maintained in a tree structure for fast lookup. They improve the speed of document retrieval, range scanning, ordering, and other operations by enabling the use of the index instead of a collection scan. While indexes improve query performance, they can slow down document inserts and updates since the indexes also need to

goinger 2010/07/19

リンク

GettingStarted - Cassandra Wiki

Search: Cassandra Wiki Login GettingStarted FrontPageRecentChangesFindPageHelpContentsGettingStarted Immutable PageCommentsInfoAttachments More Actions: See "Getting Started" in the documentation here. GettingStarted (last edited 2016-08-10 22:57:22 by JonathanEllis) Immutable PageCommentsInfoAttachments More Actions: MoinMoin PoweredPython PoweredGPL licensedValid HTML 4.01

goinger 2010/07/19

リンク

Mitch Pirtle (@mitchitized) | Twitter

goinger 2010/07/19

mongodb

リンク

SQL to MongoDB Mapping Chart - MongoDB Manual v7.0

General InformationDocumentationDeveloper Articles & TopicsCommunity ForumsBlogUniversity

goinger 2010/07/19

リンク

Install MongoDB Community Edition on macOS - MongoDB Manual v7.0

General InformationDocumentationDeveloper Articles & TopicsCommunity ForumsBlogUniversity

goinger 2010/07/19

リンク

http://yaruokansatu.blog44.fc2.com/blog-entry-740.html

goinger 2010/07/19

リンク

去年アフィで3500万稼いだら税金2000万くらい取られてワロタｗニュース速報BIP

1 以下、名無しにかわりましてVIPがお送りします 2010/07/18(日) 15:25:38.27 ID:wZkzCSCc0

goinger 2010/07/19

リンク

Unlambda - Wikipedia

組み込み関数の一覧は以下の通り。 ` 関数適用 s SKIコンビネータ計算のS k SKIコンビネータ計算のK i SKIコンビネータ計算のI v 引数を無視して v を返す d 厳密には関数というよりも特殊形式で、評価を遅延し、2回目の評価の時に初めて評価する c 継続。call-with-current-continuation。 e 継続を無視し、プログラムを終了させる r 改行を表示し、後はIと同じ振る舞い . 引数の文字を表示し、後はIと同じ振る舞い @ 標準入力から1文字読み込む ? 標準入力から1文字読み込み、引数と比較する | 標準入力から1文字読み込み、表示する「.H」や「.e」などの、「.A」で表されるものは恒等関数 (引数として与えられた値を全く変更せずにそのまま返す関数) で、副作用として「A」を表示するものである。「i」は副作用を伴わない恒等関数である。上述の

goinger 2010/07/19

リンク

Lazy K - Wikipedia

Lazy K（れいじーけー）は組み込み関数が3つしかない、純粋関数型言語である。似た言語として、同じような表記をする、非純粋関数型言語であるUnlambdaがある。概要[編集] 純粋関数型言語として、チューリング完全でありながら、絶対必要なエッセンスだけを抜き出したプログラミング言語である。遅延評価を行う。使用するにも、処理系を実装するにも、コンビネータ論理の知識が必要である。標準入力をプログラムである関数の引数として受け取る。ただし、標準入力は1バイトごとのチャーチ数（英語版）のスコットエンコードされたリストとしてエンコードされ、出力も同様に1バイトごとのチャーチ数のスコットエンコードされたリストとなる。 Lazy K にて Unlambda を実装した場合、Unlambda で Unlambda を実装した場合に比べて約1/10のソースサイズで収まる。組み込み関数[編集] Has

goinger 2010/07/19

programming

リンク

Miranda - Wikipedia

Mirandaは、遅延評価方式の純粋関数型プログラミング言語である。作者デビッド・ターナー（David Turner）による以前の言語SASLやKRCの後継でもあり、またMLやHopeの影響も受けている。イギリスのリサーチ・ソフトウェア社（Research Software Ltd.）が販売しており、同社の商標でもある。研究目的ではない商用を目指した最初の純粋関数型言語であった。よくある例題を解くプログラムに関して言えば、Mirandaのコードは（APLなどを別とすれば）ほとんどの主流のプログラミング言語よりも簡単で短く表現でき、他の関数型言語と同様、信頼性の高いプログラムの開発が命令型言語に比べて短期間で可能になったという報告がある。 1985年に登場した。処理系の実装としてはUnix系向けのC言語で実装されたもののみがある。後発のHaskellは多くの面でMirandaの影響を受けて

goinger 2010/07/19

リンク

Home of Clean

Welcome to the Clean Wiki! the online home of Clean. Clean is a general purpose, state-of-the-art, pure and lazy functional programming language designed for making real-world applications. Some of its most notable language features are uniqueness typing, dynamic typing, and generic functions. Have a look at a quick impression of Clean. The Clean System is available for the Windows, Linux, and Mac

goinger 2010/07/19

リンク

Clean - Wikipedia

Clean（クリーン）は、プログラミング言語の一つで、純粋関数型言語である。Haskell とよく似ている。一意型（英語版）により、参照透過性を保ちつつ、ファイルの破壊的な更新などができる。これは、参照透過性を保つためには値を複製した上で結果として返す必要があるが、その後複製元を二度と使用しない（参照しない）ことが保証できるのであれば、わざわざ複製せずとも直接破壊的に値を更新しても構わない、という考え方に基づく。例えば、変数aに1を加算するには a = 1 a2 = a + 1 のようにして、以降a2を使用する。もし今後もa = 1という前提で使用するならばこの方法しかないが、そうでなければaは無駄になる。しかし、プログラマはa = 1を二度と使用しないとわかっていても処理系にはわからない。それを処理系に知らせる手段が一意型（一意性型属性）である。処理系がa = 1という定義を二度と使用

goinger 2010/07/19

リンク

参照透過性 - Wikipedia

参照透過性（さんしょうとうかせい、英: Referential transparency）は、計算機言語の概念の一種である。ある式が参照透過であるとは、その式をその式の値に置き換えてもプログラムの振る舞いが変わらない(言い換えれば、同じ入力に対して同じ作用と同じ出力とを持つプログラムになる)ことを言う。具体的には変数の値は最初に定義した値と常に同じであり、関数は同じ変数を引数として与えられれば同じ値を返すということになる。当然変数に値を割り当てなおす演算である代入 (Assignment) を行う式は存在しない。このように参照透過性が成り立っている場合、ある式の値、例えば関数値、変数値についてどこに記憶されている値を参照しているかということは考慮する必要がない、即ち参照について透過的であるといえる。参照透過性が成り立つ言語は式の値がプログラムのテキストから定まるという特徴から宣言型言語

goinger 2010/07/19

ml

リンク

Getting Started with MongoDB - MongoDB Manual v7.0

General InformationDocumentationDeveloper Articles & TopicsCommunity ForumsBlogUniversity

goinger 2010/07/19

リンク

mongodb-user - Google Groups

Attention: Mongo DB Community Members - We have moved! The new Mongo DB Community forums are now live! We've built the forums to give you a centralized place to connect with other Mongo DB users, ask questions, and get answers. The new forums will soon take the place of this Google Group and will consolidate with our other community channels that you may be familiar with (Google Groups, Slack, etc.),

goinger 2010/07/19

リンク

MongoDB Press

Mongo DB PressA collection of books on Mongo DB written by experts Practical Mongo DB AggregationsPaul Done | Published Sept. 2023This technical guide takes you on a data-driven journey by teaching you how to streamline data manipulation, resolve data processing bottlenecks, and optimize pipelines. This book is your go-to resource for becoming proficient with the Mongo DB aggregation framework. Get 20

goinger 2010/07/19

book
mongodb

リンク

BSON (Binary JSON) Serialization

BSON, short for Binary JSON, is a binary-encoded serialization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type. BSON can be compared to b

goinger 2010/07/19

json
bson

リンク

関数型プログラミング - Wikipedia

関数型プログラミング（かんすうがたプログラミング、英: functional programming）とは、数学的な意味での関数を主に使うプログラミングのスタイルである[1]。 functional programming は、関数プログラミング（かんすうプログラミング）などと訳されることもある[2]。関数型プログラミング言語（英: functional programming language）とは、関数型プログラミングを推奨しているプログラミング言語である[1]。略して関数型言語（英: functional language）ともいう[1]。概要[編集] 関数型プログラミングは、関数を主軸にしたプログラミングを行うスタイルである[1]。ここでの関数は、数学的なものを指し、引数の値が定まれば結果も定まるという参照透過性を持つものである[1]。参照透過性とは、数学的な関数と同じように

goinger 2010/07/19

リンク

タグ

2010年7月19日のブックマーク (47件)

お知らせ

今週のはてなブックマーク数ランキング（2024年8月第2週）

今週のはてなブックマーク数ランキング（2024年8月第1週）

月間はてなブックマーク数ランキング（2024年7月）

公式Twitter

キーボードショートカット一覧

公式Twitter

はてなのサービス