So, there's this function. It's called a lot. More importantly, all those calls are on the critical path of a key user interaction. Let's talk about making it fast. Spoiler: it's a dot product. Some background (or skip to the juicy stuff) At Sourcegraph, we're working on a Code AI tool named Cody. In order for Cody to answer questions well, we need to give them enough context to work with. One of
The world wastes a minimum of $100M annually due to inefficient string operations. A typical codebase processes strings character by character, resulting in too many branches and data-dependencies, neglecting 90% of modern CPU's potential. LibC is different. It attempts to leverage SIMD instructions to boost some operations, and is often used by higher-level languages, runtimes, and databases. But
あらまし strlen() という関数がある。御存知の通り、文字列の長さを算出する標準 C ライブラリの関数だ。 やってることは単純で、例えば以下のように実装できる。 size_t strlen_simple(const char* str) { const char* p = str; while (*p) ++p; return size_t(p - str); } '\0' が見つかるまでポインタを進め、初期位置との差分を返すだけだ。これで機能的には std::strlen() と同等である。 では、速度的にはどうだろう?適当にベンチマークを書いて MSVC 2022 でコンパイル&実行するとこうなった。
In software, it is common to represent time as a time-stamp string. It is usually specified by a time format string. Some standards use the format %Y%m%d%H%M%S meaning that we print the year, the month, the day, the hours, the minutes and the seconds. The current time as I write this blog post would be 20230701205436 as a time stamp in this format. It is convenient because it is short, easy to rea
To find the index of the first instance of a character within a body of [ASCII] text, you might write something like: fn indexOf(haystack: []const u8, needle: u8) ?usize { for (haystack, 0..) |c, i| { if (c == needle) return i; } return null; } Or use the std.mem.indexOfScalar function from the standard library, which is essentially the same implementation. This implementation loops through the in
Part of my job is to make JavaScript things go fast. Speed is a feature, and when working in an interpreted language, squeezing every last bit of performance can be the difference between a great product and unusable garbage. Anyway, how cool would it be to make JavaScript itself go faster? I’m not a C++ programmer, but that didn’t stop me before, so I thought I’d give it a try anyway! The objecti
Google recently published a blog article and paper introducing their SIMD-accelerated sorting algorithm. SIMD stands for single instruction, multiple data. A single instruction is used to apply the same operation to multiple pieces of data. The prototypical example is addition, where one instruction can do e.g. 4 32-bit additions. A single SIMD addition should be roughly 4 times faster than perfor
SIMD accelerated sorting in Java - how it works and why it was 3x faster 09 Jun 2022 In this post I explain a little about how to use Java’s Vector APIs, attempt to explain how they turn out fast, and then use them to implement a sorting algorithm 3x faster than Arrays.sort. I then explain some problems I found, and how I resolved them. Supporting code is published here. I’m an occasional reader o
This article was discussed on Hacker News. I recently learned of csvquote, a tool that encodes troublesome CSV characters such that unix tools can correctly process them. It reverses the encoding at the end of the pipeline, recovering the original input. The original implementation handles CSV quotes using the straightforward, naive method. However, there’s a better approach that is not only simpl
According to Flynn’s taxonomy SIMD refers to a computer architecture that can process multiple data streams with a single instruction (i.e. “Single Instruction stream, Multiple Data streams”). There are different taxonomies, and within those several different sub-categories and architectures that classify as “SIMD”. In this post, however, I refer to packed SIMD ISA:s, i.e. the type of SIMD instruc
リポジトリ(kaityo256/sevendayshpc) HTML版 一括PDF版 はじめに なぜスパコンを使うのか Day 1 : 環境構築 とりえあず手元のPCでMPIが使える環境を整え、簡単なMPIプログラミングを試してみる。 MPIとは 余談:MPIは難しいか MPIのインストール はじめてのMPI ランク 標準出力について GDBによるMPIプログラムのデバッグ Day 2 : スパコンの使い方 スパコンを使うときに知っておきたいこと。ジョブの投げ方など。 はじめに スパコンとは 余談:BlueGene/Lのメモリエラー スパコンのアカウントの取得方法 ジョブの実行の仕組み ジョブスクリプトの書き方 フェアシェア バックフィル チェーンジョブ ステージング 並列ファイルシステム Day 3 : 自明並列 自明並列、通称「馬鹿パラ」のやり方について。 自明並列、またの名を馬鹿パラ
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く