State-Space Model (Mamba)

A sequence model architecture that processes tokens through a compressed hidden state, offering linear-time scaling as an alternative to Transformer attention.

State-space models (SSMs) are a family of sequence architectures that process input one token at a time through a continuously updated hidden "state," much like a classical control system or RNN. Mamba, introduced by Albert Gu and Tri Dao in late 2023, is the breakthrough variant that made SSMs competitive with Transformers on language modeling. The appeal is efficiency. Transformer attention compares every token with every other token, so cost grows quadratically with sequence length. Mamba's compute and memory grow linearly, which makes very long contexts — DNA sequences, audio waveforms, million-token documents — far cheaper to handle. Mamba's key trick is a "selective" mechanism: the state-update parameters depend on the input itself, so the model can decide which information to keep in its compressed state and which to forget. A rough analogy: a Transformer is like a reader who keeps every page of a book spread out on the desk and glances back at all of them for each new word. An SSM is like a reader who keeps a running summary in their head — cheaper to maintain, but the summary has to be good enough to capture what matters. Mamba's selectivity is what makes that summary good. In practice you'll see Mamba in long-context research models, hybrid architectures (such as Jamba, which mixes Mamba layers with attention), and domains like genomics and audio where sequences are huge. Pure SSMs still trail top Transformers on some recall-heavy tasks, which is why hybrids are popular. Related concepts: Transformer, attention mechanism, RNN, linear attention, Jamba, long-context models.