Skip to content

Architecture

Encoder

A neural network component that converts input data into a dense vector representation capturing its meaning.

An **encoder** is the part of a neural network that takes raw input — text, an image, audio — and compresses it into a vector (or sequence of vectors) that captures the input's meaning in a form the rest of the model can work with. The output is often called an embedding or a hidden representation. Encoders matter because almost every modern AI system has one somewhere. BERT is a pure encoder model used for search, classification, and similarity. Models like CLIP have separate encoders for images and text that map both into the same vector space. In encoder-decoder architectures (the original Transformer, T5, most translation models), the encoder reads the source and the decoder generates the output. A useful analogy: imagine you're translating a paragraph. You first read it carefully and form a mental understanding of what it says — that's the encoder. Then you write the translation based on that understanding — that's the decoder. The "mental understanding" is the encoded representation: not the original words, but everything needed to reproduce or act on them. Modern decoder-only LLMs like GPT or Claude technically don't have a separate encoder block, but the early layers still play a similar role of turning tokens into rich contextual representations before later layers do the generation work. Related concepts: decoder, embedding, transformer, BERT, encoder-decoder architecture, attention.

Last updated: 2026-04-29

We use cookies

Anonymous analytics help us improve the site. You can opt out anytime. Learn more