Architecture
Multi-head attention
A Transformer mechanism that runs several attention operations in parallel, letting the model focus on different relationships in the input at the same time.
Architecture
A Transformer mechanism that runs several attention operations in parallel, letting the model focus on different relationships in the input at the same time.
We use cookies
Anonymous analytics help us improve the site. You can opt out anytime. Learn more