Technique
Speculative decoding
An inference speed-up where a small "draft" model proposes several tokens and a large model verifies them in parallel — making LLM generation 2-3× faster with no quality loss.
Technique
An inference speed-up where a small "draft" model proposes several tokens and a large model verifies them in parallel — making LLM generation 2-3× faster with no quality loss.
We use cookies
Anonymous analytics help us improve the site. You can opt out anytime. Learn more