Long Short-Term Memory models are a type of Recurrent Neural Networks (RNN) that are specifically designed to learn and remember patterns over time. They address the challenges faced by traditional RNNs, particularly the problem of vanishing and exploding gradients, which made it difficult to learn long-term dependencies in sequential data. LSTMs are mostly used for sequential data tasks, especially Natural Language Processing, speech recognition, time series prediction, etc
How LSTMs Work
LSTMs use a special structure known as a memory cell to store information over time. Each cell has gates that regulate the flow of information, and they decide what to keep, update, or discard. These gates are implemented using neural network layers with sigmoid or tanh activations.
Forget Gate ()
Decides what information to discard from the cell state. It is represented by the formula: where:
- and are weights and biases
- is the Sigmoid Activation Function
- is the previous hidden state
- is the current input
Input Gate ()
Decides what information to add to the cell state. It is represented by the formula:$$$i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)\tilde{C}t = \tanh(W_c \cdot [h{t-1}, x_t] + b_c)$$
Update Cell State
Combines the previous cell state, the forget gateβs decision, and the input gateβs updates. It is represented by the formula:
Output Gate ()
Decides what part of the cell state to output. It is represented by the formula: and then the hidden state is calculated as: