yuisekiのブックマーク - はてなブックマーク

LLaMA Now Goes Faster on CPUs
I just wrote 84 new matrix multiplication kernels for llamafile which enable it to read prompts / images faster. Compared to llama.cpp, prompt eval time with llamafile should go anywhere between 30% and 500% faster when using F16 and Q8_0 weights on CPU. The improvements are most dramatic for ARMv8.2+ (e.g. RPI 5), Intel (e.g. Alderlake), and AVX512 (e.g. Zen 4) computers. My kernels go 2x faster
yuiseki 2024/04/01
あとで読む
リンク
Cosmopolitan C Library
yuiseki 2023/12/11
リンク
Cosmopolitan Libc: build-anywhere run-anywhere C library
Cosmopolitan Libc makes C a build-anywhere run-anywhere language, like Java, except it doesn't need an interpreter or virtual machine. Instead, it reconfigures stock GCC and Clang to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + NetBSD + BIOS on AMD64 and ARM64 with the best possible performance. Getting Started First, download the Cosmop
yuiseki 2023/12/11
リンク
1

はてなブックマーク