NOV 1
44 MIN

Long Short-Term Memory and Recurrent Networks

The document is an academic article from 1997 introducing the Long Short-Term Memory (LSTM) neural network architecture, designed to solve the problem of vanishing or exploding error signals during the training of recurrent neural networks over long time intervals. Authored by Sepp Hochreiter and Jürgen Schmidhuber, the paper details how conventional gradient-based methods like Back-Propagation Through Time (BPTT) and Real-Time Recurrent Learning (RTRL) fail with long time lags, primarily due to the exponential decay of backpropagated error. LSTM remedies this with its Constant Error Carrousel (CEC), which enforces constant error flow through special units, controlled by multiplicative input and output gate units that regulate access to this constant flow. The authors present numerous experiments demonstrating that LSTM significantly outperforms previous recurrent network algorithms on various tasks involving noise, distributed representations, and very long minimal time lags

Episode Webpage

Show

The Gist Talk
Published

November 1, 2025 at 1:28 AM UTC
Length

44 min
Rating

Clean

Long Short-Term Memory and Recurrent Networks

Information