The document is an academic article from 1997 introducing the Long Short-Term Memory (LSTM) neural network architecture, designed to solve the problem of vanishing or exploding error signals during the training of recurrent neural networks over long time intervals. Authored by Sepp Hochreiter and Jürgen Schmidhuber, the paper details how conventional gradient-based methods like Back-Propagation Through Time (BPTT) and Real-Time Recurrent Learning (RTRL) fail with long time lags, primarily due to the exponential decay of backpropagated error. LSTM remedies this with its Constant Error Carrousel (CEC), which enforces constant error flow through special units, controlled by multiplicative input and output gate units that regulate access to this constant flow. The authors present numerous experiments demonstrating that LSTM significantly outperforms previous recurrent network algorithms on various tasks involving noise, distributed representations, and very long minimal time lags
Information
- Show
- PublishedNovember 1, 2025 at 1:28 AM UTC
- Length44 min
- RatingClean
