LSTM (Long Short-Term Memory)

A type of recurrent neural network designed to remember information over long sequences, widely used before Transformers took over.

LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) introduced by Hochreiter and Schmidhuber in 1997. It was designed to solve the "vanishing gradient" problem that prevented earlier RNNs from learning long-range dependencies in sequence data. The key innovation is a "cell state" that runs through the network like a conveyor belt, plus three gates (input, forget, output) that decide what information to add, throw away, or pass on at each time step. This gating mechanism lets the network selectively remember things from much earlier in a sequence — for example, the subject of a sentence written 50 words ago. Before Transformers dominated the field around 2018, LSTMs powered most state-of-the-art systems for machine translation, speech recognition, handwriting recognition, and early language models. Google Translate famously ran on LSTM-based models before switching to Transformer architectures. A simple analogy: imagine reading a long novel and taking notes. A vanilla RNN tries to remember everything in its head and quickly forgets early chapters. An LSTM keeps a notebook (cell state) and at each page decides what to jot down, what to cross out, and what to read aloud — so it can still recall the protagonist's name 300 pages later. LSTMs are mostly historical now in NLP, but they're still used in time-series forecasting, some embedded/low-resource settings, and as a teaching example. Related concepts to look up next: RNN, GRU, Transformer, attention, sequence-to-sequence, vanishing gradient.