vLLM[B!]新着記事・評価 - はてなブックマーク

『vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention』

vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention
4 users
vllm.ai

GitHub | Documentation | Paper LLMs promise to fundamentally change how we use AI across all industries. However, actually serving these models is challenging and can be surprisingly slow even on expensive hardware. Today we are excited to introduce vLLM, an open-source library for fast LLM inference and serving. vLLM utilizes PagedAttention, our new attention algorithm that effectively manages at
- テクノロジー
- 2023/06/21 12:43

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx