Architecture
Mixture of Experts (MoE)
A neural network architecture that splits the model into many specialized "expert" sub-networks and routes each input to only a few of them, giving huge parameter counts at a fraction of the compute cost.
Architecture
A neural network architecture that splits the model into many specialized "expert" sub-networks and routes each input to only a few of them, giving huge parameter counts at a fraction of the compute cost.
We use cookies
Anonymous analytics help us improve the site. You can opt out anytime. Learn more