サクサク読めて、アプリ限定の機能も多数!
トップへ戻る
TGS2024
vllm.ai
GitHub | Documentation | Paper LLMs promise to fundamentally change how we use AI across all industries. However, actually serving these models is challenging and can be surprisingly slow even on expensive hardware. Today we are excited to introduce vLLM, an open-source library for fast LLM inference and serving. vLLM utilizes PagedAttention, our new attention algorithm that effectively manages at
このページを最初にブックマークしてみませんか?
『vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention』の新着エントリーを見る
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く