Hivemall is an open source machine learning library built as a collection of Hive UDFs. It provides over 100 machine learning algorithms and functions for tasks like feature engineering, evaluation, and recommendation. Hivemall entered the Apache Incubator in 2016 and the first Apache release (v0.5.0) is upcoming. It supports platforms like Hive, Spark, and Pig for scalable parallel processing.Rea
The Event Collector system was one of the legacy systems at Treasure Data. Over time it faced several performance and scalability problems as usage increased. Engineers addressed these problems through optimizations like increasing socket backlogs, caching parsers, running processes in parallel, and moving deduplication to a separate thread to avoid blocking the input pipeline. These changes helpe
Bigdam is a planet-scale data ingestion pipeline designed for large-scale data ingestion. It addresses issues with the traditional pipeline such as imperfectqueue throughput limitations, latency in queries from event collectors, difficulty maintaining event collector code, many small temporary and imported files. The redesigned pipeline includes Bigdam-Gateway for HTTP endpoints, Bigdam-Pool for d
User defined partitioning is a new partitioning strategy in Treasure Data that allows users to specify which column to use for partitioning, in addition to the default "time" column. This provides more flexible partitioning that better fits customer data platform workloads. The user can define partitioning rules through Presto or Hive to improve query performance by enabling colocated joins and fi
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く