nagayamaのブックマーク - はてなブックマーク

Make loading weights 10-100x faster by jart · Pull Request #613 · ggerganov/llama.cpp
This is a breaking change that's going to give us three benefits: Your inference commands should load 100x faster You may be able to safely load models 2x larger You can run many concurrent inference processes This was accomplished by changing the file format so we can mmap() weights directly into memory without having to read() or copy them thereby ensuring the kernel can make its file cache page
nagayama 2023/04/04
リンク
GitHub - ggerganov/whisper.cpp: Port of OpenAI's Whisper model in C/C++
Stable: v1.5.4 / Roadmap | F.A.Q. High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model: Plain C/C++ implementation without dependencies Apple Silicon first-class citizen - optimized via ARM NEON, Accelerate framework, Metal and Core ML AVX intrinsics support for x86 architectures VSX intrinsics support for POWER architectures Mixed F16 / F32 precision 4-bit and 5
nagayama 2022/11/16
リンク
GitHub - ggerganov/imtui: ImTui: Immediate Mode Text-based User Interface C++ Library
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
nagayama 2020/01/08
リンク
1