みなさんはApache Arrowを知っていますか? 普段データを処理している人でも今はまだ知らない人の方が多いかもしれません。しかし、数年後には「データ処理をしている人ならほとんどの人が知っている」となるプロダクトです。(そうなるはずです。) Apache Arrowはメモリー上でデータ処理するときに必要なもの一式を提供します。たとえば、効率的なデータ交換のためのデータフォーマット、CPU/GPUの機能を活用した高速なデータ操作機能などです。 一部のデータ処理ツールではすでにApache Arrowを使い始めています。たとえば、Apache SparkはApache Arrowを活用することでPySpark(PythonからApache Sparkを使うためのモジュール)とのやりとりを高速化しています。データ量によっては10倍以上も高速になります。(リンク先の例では20秒→0.7秒と約3
One July afternoon in 2024, Ryan Williams set out to prove himself wrong. Two months had passed since he’d hit upon a startling discovery about the relationship between time and memory in computing. It was a rough sketch of a mathematical proof that memory was more powerful than computer scientists believed: A small amount would be as helpful as a lot of time in all conceivable computations. That
With the following code, we can check whether a year 0 ≤ y ≤ 102499 is a leap year with only about 3 CPU instructions: bool is_leap_year_fast(uint32_t y) { return ((y * 1073750999) & 3221352463) <= 126976; } How does this work? The answer is surprisingly complex. This article explains it, mostly to have some fun with bit-twiddling; at the end, I'll briefly discuss the practical use. This is how a
Reservoir sampling is a technique for selecting a fair random sample when you don't know the size of the set you're sampling from. By the end of this essay you will know: When you would need reservoir sampling. The mathematics behind how it works, using only basic operations: subtraction, multiplication, and division. No math notation, I promise. A simple way to implement reservoir sampling if you
The original motivation for the creation of Bloom filters is efficient set membership, using a probabilistic approach to significantly reduce the time and space required to reject items that are not members in a certain set. The data structure was proposed by Burton Bloom in a 1970 paper titled "Space/Time Trade-offs in Hash Coding with Allowable Errors". It's a good paper that's worth reading. Wh
Competitive Programming in Haskell: stacks, queues, and monoidal sliding windows Suppose we have a list of items of length \(n\), and we want to consider windows (i.e. contiguous subsequences) of width \(w\) within the list. ⊕A list of numbers, with contiguous size-3 windows highlighted We can compute the sum of each window by brute force in \(O(nw)\) time, by simply generating the list of all the
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く