dannのブックマーク - はてなブックマーク

dann id:dann

ブックマーク / www.baseten.co (1)

A guide to LLM inference and performance
We want to use the full power of our GPU during LLM inference. To do that, we need to know if our inference is compute bound or memory bound so that we can make optimizations in the right area. Calculating the operations per byte possible on a given GPU and comparing it to the arithmetic intensity of our model’s attention layers reveals where the bottleneck is: compute or memory. We can use this i
dann 2024/01/31
llm

performance
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx