Technique
KV cache
A cache of the Key and Value tensors from past tokens that lets transformers avoid recomputing them at each new generation step — the main reason long contexts use so much memory.
Technique
A cache of the Key and Value tensors from past tokens that lets transformers avoid recomputing them at each new generation step — the main reason long contexts use so much memory.
We use cookies
Anonymous analytics help us improve the site. You can opt out anytime. Learn more