We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of
The Unified Logging Infrastructure for Data Analytics at Twitter George Lee, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy Twitter, Inc. @GeorgeJLee @lintool @chuangl4 @mrtall @squarecog ABSTRACT In recent years, there has been a substantial amount of work on large-scale data analytics using Hadoop-based platforms running on large clusters of commodity machines. A less- explored topic is
The document discusses Twitter's data analytics platform, including Hadoop and Vertica. It outlines Twitter's data flow, which ingests 400 million tweets daily into HDFS, then uses various tools like Crane, Oink, and Rasvelg to run jobs on the main Hadoop cluster before loading analytics into Vertica and MySQL for web tools and analysts. It also describes Twitter's heterogeneous technology stack a
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く