Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

学びカテゴリーの変更を依頼記事元:

arxiv.org

2users がブックマークコメント

記事へのコメント1件

注目コメント
新着コメント

takayamaki KVキャッシュの各エントリをどう圧縮するかはTurboQuantでほぼシャノン限界であり、これ以上はKVキャッシュがseqenceなのを利用して予測delta符号化するしかない、という話を定式化して主張した論文

2026/05/05 リンク

注目コメント算出アルゴリズムの一部にLINEヤフー株式会社の「建設的コメント順位付けモデルAPI」を使用しています

規約違反を報告

いまの話題をアプリでチェック！

バナー広告なし
ミュート機能あり
ダークモード搭載

アプリをダウンロード

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

Recent work on KV cache quantization, culminating in TurboQuant, has approached the Shannon entro... Recent work on KV cache quantization, culminating in TurboQuant, has approached the Shannon entropy limit for per-vector compression of transf ormer key-value caches. We observe that this limit applies to a strictly weaker probl em than the one that actually matters: compressing the KV cache as a sequence. The tokens stored in a KV cache are not arbitrary floating-point data -- they are samples from